NoFib for benchmarking vs testing

Simon Marlow simonmar@microsoft.com
Mon, 17 Dec 2001 16:25:47 -0000


> Looking through NoFib some more, I am increasingly getting=20
> the impression that=20
> quite a few of the tests aren't particularly useful benchmarks.
>=20
>=20
> Currently NoFib is both a benchmark suite and a test suite.

Yes.

> I think it might be useful to separate those roles out,=20
> giving us separate=20
> "test" and "benchmark" modes.
>=20
> When run in "benchmark" mode, we would ignore tests that=20
> don't make good=20
> benchmarks.

Well, just about anything makes a good benchmark.  Remember, we measure
several properties that don't vary randomly from run to run:
allocations, GC copies, possibly "instructions executed", and size of
the program and individual object modules.  Furthermore, we use the
source code to benchmark the *compiler*, comparing the time the compiler
takes to compile a module against previous versions.

So even when the runtime is too small to make a useful benchmark,
there's useful information in all of the above categories, which I
wouldn't like to lose.

The current nofib-analyse program ignores all runtimes smaller than a
certain threshold when calculating its geometric means.  I think this is
the right way to go: perhaps you want an option to disable some of the
fast-running tests when running benchmarks that depend on CPU time
measurements, but I doubt we will use such an option day-to-day because
of all the other useful information we get from such tests.

Also: multiple "modes" are generally a bad idea because they lead to
testing headaches.  Better to only have a single way to run something if
you can (we've learned this lesson the hard way!).  In this case I think
having two modes in which to run nofib is justified, but let's keep it
at that.

Cheers,
	Simon