[Haskell-cafe] How to correctly benchmark code with Criterion?

Thu Oct 18 11:23:25 CEST 2012

Dear list,

during past few days I spent a lot of time trying to figure out how to write Criterion benchmarks, 
so that results don't get skewed by lazy evaluation. I want to benchmark different versions of an 
algorithm doing numerical computations on a vector. For that I need to create an input vector 
containing a few thousand elements. I decided to create random data, but that really doesn't 
matter - I could have as well use infinite lists instead of random ones.

My problem is that I am not certain if I am creating my benchmark correctly. I wrote a function 
that creates data like this:

dataBuild :: RandomGen g => g -> ([Double], [Double])
dataBuild gen = (take 6 $ randoms gen, take 2048 $ randoms gen)

And I create benchmark like this:

bench "Lists" $ nf L.benchThisFunction (L.dataBuild gen)

The question is how to generate data so that its evaluation won't be included in the benchmark. I 
already asked this question on StackOverflow ( 
http://stackoverflow.com/questions/12896235/how-to-create-data-for-criterion-benchmarks#comment17466915_12896235 ) 
and got answer to use evaluate + force. After spending one day on  testing this approach I came 
to conclusion that doing this does not seem to influence results of a benchmark in any way (I did 
stuf like unsagePerformIO + delayThread). On the other hand I looked into sources of criterion 
and I see that the benchmark code is run like this: evaluate (rnf (f x))
I am a Haskell newbie and perhaps don't interpret this correctly, but to me it looks as though 
criterion did not evaluate the possibly non-evaluated parameter x before running the benchmark, 
but instead evaluates the final result. Can someone provide an explanation on how this exactly 
works and how should I write my benchmarks so that results are correct?

Janek