Suggestions for summarizing buildbot test results
Thorkil Naur
naur at post11.tele.dk
Wed Apr 4 08:37:09 EDT 2007
Hello,
I have followed the development of various reporting formats used for the
daily buildbot test results with interest. Here are some suggestions. My main
concern is to make it easy to follow the state of the individual test case in
some detail so that it becomes easier to see which test cases may need
detailed investigation and perhaps also identify particularly problematic
environments for a test case.
The basic idea is to have a table like the following for each test case and
way like barton-mangler-bug(optasm). Please use a fixed-spaced font to look
at this:
0123456789 Summary
tatd2 x86 OSX head
tatd2 x86 OSX 6.6 ++++ *
tatd2 PPC OSX head .|.|.|.|.| |
tatd2 PPC OSX 6.6 ||-- *
tnaur PPC OSX 6.6 .-.-.-.-.- -
Summary **** *****
In this table, time is across and builder is down. So each column represents a
time interval. I think of this as a day, corresponding to the interval with
which builds happen in most cases, but I know that it is not necessarily
true. But leave that problem till later. In the example here, I have
arbitrarily shown 10 time intervals that could be the latest 10 days.
Each row then represents a builder and the table contains a single character
indication of the result of the test case as run by that builder on that day.
The character represents the result using this code:
Expected OK: (i.e. blank)
Unexpected OK: -
Expected Fail: |
Unexpected Fail: +
No information: .
Mixed: * (used in summaries)
In selecting these particular characters, I have tried to be clever: The
Expected/Unexpected property is represented by blank/- and the OK/fail
property by blank/|. So that combining Unexpected with Fail gives +. But this
choice, not least the use of blank to represent Expected OK, is certainly
debatable.
"No information" would be used in case a particular builder has had nothing to
say for the test case on that particular day.
So the table tells us, for example, that the builder tatd2 x86 OSX head has
passed the test all 10 times and that tatd2 x86 OSX 6.6 has unexpectedly
started to fail by build day 6.
The summaries are constructed in a hopefully obvious manner which, in addition
to being able to show the single, unchanged, result over a list of results,
allows the display of "changed".
Raw information like this might be useful in itself, perhaps combined with
suitable grouping of the builders into (currently) 6.6 and head. But the
information could also be combined and summarized in various ways that would
hopefully give some improved insight into the state of things. Initially, a
simple loss-less compression scheme could be used that just grouped the test
cases with identical result tables. This would hopefully create some
significant groups of all-OK cases and also, hopefully, group some of the
different ways of particular test cases.
Additional possibilities present themselves if we allow lossy compression. For
example, we might summarize all the ways of a particular test case or a test
case for all builders in a particular group. Summarizing test results into
just "identical" and "changed" might be too crude in practice.
(It seems that something useful would result if the test results were somehow
easily accessible in a Haskell program, something that could very well be
easy to arrange.)
Best regards
Thorkil
More information about the Cvs-ghc
mailing list