Accessing new heap costs more than old?

Sat Sep 3 13:14:07 EDT 2005

Hello folks,

I wonder if anybody can shed any light on the strange behaviour I've
been seeing with some of my benchmarks. The story is..

While measuring execution times of various AVL routines on random data
I found that insertion was taking far longer than deletion (over twice
as long). This is surprising because if anything deletion is the more
complex operation. Anyway after much struggling and experimentation with
different options/inlining etc I failed to fix this so tried the same
tests with Data.Intmap and got similar unexpected results.

But I've now found that the root cause seems to be a subtle difference
in the two tests. For insertion the test was cummulative, so each new
insertion was on the tree resulting from the previous insertion. But
for deletion the deletion was always done on the same tree. If I modify
the insertion test to work the same way as deletion then sure enough,
insertion is faster than deletion (as I would expect). The same is true
for Data.IntMap too. (The insertion speeds for the two modes differ by
a factor of 2.8..2.9 for both Data.Tree.AVL and Data.IntMap). 

But I can't think of a plausible explanation for this. The overall heap
burn rate should be about the same in each case, as should the overall
amount of live (non-garbage) heap records.

I thought maybe it might be a cache effect, but if anything I would
expect caching to favour the cummulative mode (new allocations should
displace old from the cache AFAIK). Also profiling shows that the
cumulative case performs slightly fewer allocations (as I'd expect
because it starts with an empty tree). 

Anyway, just thought I'd mention it in the hope that there might be
something that can be done about it. The cummulative case seems like
it would be more typical of real world code, so taking a factor of
3 or so performance hit is undesirable IMHO, but may well
be unavoidable :-(

Regards
--
Adrian Hey