Fwd: Map / IntMap laziness (was: export toDescList from Data.Map)

Fri Sep 26 14:01:48 EDT 2008

(Sorry for the spam, but I don't understand why the body of my messages is
being stripped. Something to do with the attachments?)

---------- Forwarded message ----------
From: Scott Dillard <sedillard at gmail.com>
Date: Fri, Sep 26, 2008 at 11:57 AM
Subject: Map / IntMap laziness (was: export toDescList from Data.Map)
To: Evan Laforge <qdunkan at gmail.com>
Cc: libraries at haskell.org

On Fri, Sep 26, 2008 at 8:20 AM, Evan Laforge <qdunkan at gmail.com> wrote:

> Of course, anything that boosts performance is a win.  Does anyone
> know of any Map benchmarks?  I suppose I could write some, but I'd be
> sort of surprised if someone else hasn't already done that.  Then we
> could throw in all the various avl / redblack / whatever map variants
> in there and see how they stack up.

I'm sure everyone has a benchmark floating around. I've attached the one
I've been using.  As we all know from following the shootout rankings, a
single benchmark is pointless and misleading. Better to have targeted
benchmarks for specific use patterns. The problem is that search trees are
used in so many ways. I've divided it up into two domains: inputs and
queries. Then you get a single micro-benchmark by picking one thing in each
domain. Here's what I've got

Inputs :
 - Random
 - Random with duplicates
 - Sequential
 - The union of many small (100 or so) random sets

Queries :
 - Sum all keys (or values)
 - Sum a random subset of keys (may or may not be present in the map)

That last query takes an additional parameter, how large of a subset to
search for. This is important to capture the case
when you build a map but only query it a few times. I can't think of when
I've done this, but laziness is a big win here, just
as in "take n . sort" for big lists and small n.

I've made an optionally-lazy IntMap (attached). I removed the bangs from the
tree node constructors and replaced them with seqs in
the functions. I've only added strictness to the "one-at-a-time" functions:
insert, delete, update, etc. I've let all the functions of type Map -> Map
-> Map become lazy. Here is the speedup I get for performing a single lookup
on the result of  the union 10000 lists, each of size 100.

STRICT:
  976,823,060 bytes allocated in the heap
  572,439,796 bytes copied during GC (scavenged)
    3,426,580 bytes copied during GC (not scavenged)
  31,641,600 bytes maximum residency (21 sample(s))
        1864 collections in generation 0 (  2.76s)
          21 collections in generation 1 (  1.76s)
          84 Mb total memory in use
    MUT   time    3.70s  (  3.79s elapsed)
    GC    time    4.52s  (  4.58s elapsed)
    Total time    8.22s  (  8.38s elapsed)
    %GC time      55.0%  (54.7% elapsed)
    Productivity  45.0% of total user, 44.1% of total elapsed
  real    0m8.391s
  user    0m8.217s
  sys    0m0.172s

LAZY:
  740,632,056 bytes allocated in the heap
  147,902,632 bytes copied during GC (scavenged)
    3,316,016 bytes copied during GC (not scavenged)
  32,079,872 bytes maximum residency (7 sample(s))
        1412 collections in generation 0 (  0.60s)
            7 collections in generation 1 (  0.42s)
          66 Mb total memory in use
    MUT   time    1.96s  (  2.02s elapsed)
    GC    time    1.02s  (  1.09s elapsed)
    Total time    2.98s  (  3.10s elapsed)
    %GC time      34.2%  (35.0% elapsed)
    Productivity  65.8% of total user, 63.3% of total elapsed
  real    0m3.114s
  user    0m2.984s
  sys    0m0.132s

This difference quickly disappears as the size of the query set increases,
but I've not been able to get the lazy version to go slower.
In all use-cases that I've tried, lazy IntMaps and Maps performed no worse
than the current strict implementations.

The difference between lazy and strict tree constructors is less pronounced
with Data.Map. I can't get it to make much of a difference.
Using seq instead of bangs seems to speed things up a little, but I think
this is spurious compiler behavior. Keeping the tree balanced
limits how much work you can put off, so really the only benefit for
Data.Map is in functions like mapKeysMonotonic and mapWithKey, but I still
think it would be useful to have lazy versions of those functions. The trie
on which IntMap is based allows for a lot more laziness because its
structure only changes along the path between the root and the point of
operation. This is why lazy unioning works so well. This will become even
more useful as we all start writing parallel code (which of course we all
plan to start doing Real Soon Now) because "foldr insert" doesn't
parallelize but "unions . map singleton" does.

The issue then is how to expose this laziness. I think the semantics (lazy
or strict) of the currently exported functions should not be changed.
Instead there should be lazy functions, either named like unionL, or put in
a Lazy submodule. The submodule approach seems better to me but it would
necessitate splitting up the library into more than 2 files. Data.Map.Base,
Data.Map.Strict, Data.Map.Lazy, etc. The current interface would be
rexported from Data.Map, unchanged.

To be honest, I had no particular rational reason for wanting IntMap
> (and in fact I'm not using it).  I just heard some mumbling on the
> list about "Data.Map so slow, IntMap so much faster especially for
> unions".  Though to be honest, probably what most matters to me is how
> much garbage an insert generates (and similarly how much the resulting
> structures will share).  I didn't run any tests or anything, so I
> wasn't that upset to not be able to use IntMap :)
>

They are currently the best option for pure, persistent arrays. Better even
than Data.Sequence. (I'm told.) I use them a lot. You heard right about
unioning. The above mentioned benchmark takes 18 seconds with a Map. They
generate much less garbage because less of the tree is mutated after an
insert/delete.

> Points in time.  I was using Int64 at the time, but I'm actually using
> Doubles now.  Plain Data.Map is working fine in practice.

Yeah for that use-case I don't think you'll beat a good balanced binary
tree, not in any language. A bit-trie won't really work because you could
have duplicate entries (+/- 0) and they wouldn't be stored in sorted order.

Scott
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/libraries/attachments/20080926/be31bc25/attachment.htm