[Haskell] ANNOUNCE: The Fibon benchmark suite (v0.2.0)

Jason Dagit dagit at codersbase.com
Tue Nov 9 20:58:25 EST 2010

On Tue, Nov 9, 2010 at 5:47 PM, David Peixotto <dmp at rice.edu> wrote:

> On Nov 9, 2010, at 3:45 PM, Jason Dagit wrote:
> I have a few questions:
>   * What differentiates fibon from criterion?  I see both use the
> statistics package.
> I think the two packages have different benchmarking targets.
> Criterion allows you to easily test individual functions and gives some
> help with benchmarking in the presence of lazy evaluation. If some code does
> not execute for a long time it will run it multiple times to get sensible
> timings. Criterion does a much more sophisticated statistical analysis of
> the results, but I hope to incorporate that into the Fibon analysis in the
> future.
> Fibon is a more traditional benchmarking suite like SPEC or nofib. My
> interest is using it to test compiler optimizations. It can only benchmark
> at the whole program level by running an executable. It checks that the
> program produces the correct output, can collect extra metrics generated by
> the program, separates collecting results from analyzing results, and
> generates tables directly comparing the results from different benchmark
> runs.
>   * Does it track memory statistics?  I glanced at the FAQ but didn't see
> anything about it.
> Yes, it can read memory statistics dumped by the GHC runtime. It has built
> in support for reading the stats dumped by `+RTS -t --machine-readable`
> which includes things like bytes allocated and time spent in GC.

Oh, I see.  In that case, it's more similar to darcs-benchmark.  Except that
darcs-benchmark is tailored specifically at benchmarking darcs.  Where they
overlap is parsing the RTS statistics, running the whole program, and
tabular reports.  Darcs-benchmark adds to that an embedded DSL for
specifying operations to do on the repository between benchmarks (and
translating those operations to runnable shell snippets).

I wonder if Fibon and darcs-benchmark could share common infrastructure
beyond the statistics package.  It sure sounds like it to me.  Perhaps some
collaboration is in order.

>   * Are the numbers in the sample output seconds or milliseconds?  What is
> the stddev (eg., what does the distribution of run-times look like)?
> I'm not sure which results you are referring to exactly (the numbers in the
> announcement were lines of code). I picked benchmarks that all ran for at
> least a second (and hopefully longer) with compiler optimizations enabled.
> On an 8-core Xeon, the median time over all benchmarks is 8.43 seconds, mean
> time is 12.57 seconds and standard deviation is 14.56 seconds.

I probably read your email too fast, sorry.  Thanks for the clarification.

