Suggestions for summarizing buildbot test results

Thorkil Naur naur at post11.tele.dk
Wed Apr 4 08:37:09 EDT 2007


Hello,

I have followed the development of various reporting formats used for the 
daily buildbot test results with interest. Here are some suggestions. My main 
concern is to make it easy to follow the state of the individual test case in 
some detail so that it becomes easier to see which test cases may need 
detailed investigation and perhaps also identify particularly problematic 
environments for a test case.

The basic idea is to have a table like the following for each test case and 
way like barton-mangler-bug(optasm). Please use a fixed-spaced font to look 
at this:

                      0123456789 Summary
  tatd2 x86 OSX head
  tatd2 x86 OSX 6.6         ++++ *
  tatd2 PPC OSX head  .|.|.|.|.| |
  tatd2 PPC OSX 6.6   ||--       *
  tnaur PPC OSX 6.6   .-.-.-.-.- -
  Summary             **** *****

In this table, time is across and builder is down. So each column represents a 
time interval. I think of this as a day, corresponding to the interval with 
which builds happen in most cases, but I know that it is not necessarily 
true. But leave that problem till later. In the example here, I have 
arbitrarily shown 10 time intervals that could be the latest 10 days.

Each row then represents a builder and the table contains a single character 
indication of the result of the test case as run by that builder on that day.

The character represents the result using this code:

  Expected OK:       (i.e. blank)
  Unexpected OK:   -
  Expected Fail:   |
  Unexpected Fail: +
  No information:  .
  Mixed:           * (used in summaries)

In selecting these particular characters, I have tried to be clever: The 
Expected/Unexpected property is represented by blank/- and the OK/fail 
property by blank/|. So that combining Unexpected with Fail gives +. But this 
choice, not least the use of blank to represent Expected OK, is certainly 
debatable.

"No information" would be used in case a particular builder has had nothing to 
say for the test case on that particular day.

So the table tells us, for example, that the builder tatd2 x86 OSX head has 
passed the test all 10 times and that tatd2 x86 OSX 6.6 has unexpectedly 
started to fail by build day 6.

The summaries are constructed in a hopefully obvious manner which, in addition 
to being able to show the single, unchanged, result over a list of results, 
allows the display of "changed".

Raw information like this might be useful in itself, perhaps combined with 
suitable grouping of the builders into (currently) 6.6 and head. But the 
information could also be combined and summarized in various ways that would 
hopefully give some improved insight into the state of things. Initially, a 
simple loss-less compression scheme could be used that just grouped the test 
cases with identical result tables. This would hopefully create some 
significant groups of all-OK cases and also, hopefully, group some of the 
different ways of particular test cases.

Additional possibilities present themselves if we allow lossy compression. For 
example, we might summarize all the ways of a particular test case or a test 
case for all builders in a particular group. Summarizing test results into 
just "identical" and "changed" might be too crude in practice.

(It seems that something useful would result if the test results were somehow 
easily accessible in a Haskell program, something that could very well be 
easy to arrange.)

Best regards
Thorkil



More information about the Cvs-ghc mailing list