NoFib for benchmarking vs testing
Mon, 17 Dec 2001 16:25:47 -0000
> Looking through NoFib some more, I am increasingly getting=20
> the impression that=20
> quite a few of the tests aren't particularly useful benchmarks.
> Currently NoFib is both a benchmark suite and a test suite.
> I think it might be useful to separate those roles out,=20
> giving us separate=20
> "test" and "benchmark" modes.
> When run in "benchmark" mode, we would ignore tests that=20
> don't make good=20
Well, just about anything makes a good benchmark. Remember, we measure
several properties that don't vary randomly from run to run:
allocations, GC copies, possibly "instructions executed", and size of
the program and individual object modules. Furthermore, we use the
source code to benchmark the *compiler*, comparing the time the compiler
takes to compile a module against previous versions.
So even when the runtime is too small to make a useful benchmark,
there's useful information in all of the above categories, which I
wouldn't like to lose.
The current nofib-analyse program ignores all runtimes smaller than a
certain threshold when calculating its geometric means. I think this is
the right way to go: perhaps you want an option to disable some of the
fast-running tests when running benchmarks that depend on CPU time
measurements, but I doubt we will use such an option day-to-day because
of all the other useful information we get from such tests.
Also: multiple "modes" are generally a bad idea because they lead to
testing headaches. Better to only have a single way to run something if
you can (we've learned this lesson the hard way!). In this case I think
having two modes in which to run nofib is justified, but let's keep it