<div dir="ltr">(Sorry for the spam, but I don&#39;t understand why the body of my messages is being stripped. Something to do with the attachments?)<br><br><div class="gmail_quote">---------- Forwarded message ----------<br>

From: <b class="gmail_sendername">Scott Dillard</b> <span dir="ltr">&lt;<a href="mailto:sedillard@gmail.com">sedillard@gmail.com</a>&gt;</span><br>Date: Fri, Sep 26, 2008 at 11:57 AM<br>Subject: Map / IntMap laziness (was: export toDescList from Data.Map)<br>

To: Evan Laforge &lt;<a href="mailto:qdunkan@gmail.com">qdunkan@gmail.com</a>&gt;<br>Cc: <a href="mailto:libraries@haskell.org">libraries@haskell.org</a><br><br><br><div dir="ltr"><br><div class="gmail_quote">On Fri, Sep 26, 2008 at 8:20 AM, Evan Laforge <span dir="ltr">&lt;<a href="mailto:qdunkan@gmail.com" target="_blank">qdunkan@gmail.com</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Of course, anything that boosts performance is a win. &nbsp;Does anyone<br>

know of any Map benchmarks? &nbsp;I suppose I could write some, but I&#39;d be<br>

sort of surprised if someone else hasn&#39;t already done that. &nbsp;Then we<br>

could throw in all the various avl / redblack / whatever map variants<br>

in there and see how they stack up.</blockquote><div><br>I&#39;m sure everyone has a benchmark floating around. I&#39;ve attached the one I&#39;ve been using.&nbsp; As we all know from following the shootout rankings, a single benchmark is pointless and misleading. Better to have targeted benchmarks for specific use patterns. The problem is that search trees are used in so many ways. I&#39;ve divided it up into two domains: inputs and queries. Then you get a single micro-benchmark by picking one thing in each domain. Here&#39;s what I&#39;ve got<br>


<br>Inputs : <br>&nbsp;- Random<br>&nbsp;- Random with duplicates<br>&nbsp;- Sequential <br>&nbsp;- The union of many small (100 or so) random sets<br><br>Queries :<br>&nbsp;- Sum all keys (or values)<br>&nbsp;- Sum a random subset of keys (may or may not be present in the map)<br>


<br>That last query takes an additional parameter, how large of a subset to search for. This is important to capture the case <br>when you build a map but only query it a few times. I can&#39;t think of when I&#39;ve done this, but laziness is a big win here, just<br>


as in &quot;take n . sort&quot; for big lists and small n.<br><br>I&#39;ve made an optionally-lazy IntMap (attached). I removed the bangs from the tree node constructors and replaced them with seqs in<br>the functions. I&#39;ve only added strictness to the &quot;one-at-a-time&quot; functions: insert, delete, update, etc. I&#39;ve let all the functions of type Map -&gt; Map -&gt; Map become lazy. Here is the speedup I get for performing a single lookup on the result of&nbsp; the union 10000 lists, each of size 100. <br>


<br>STRICT:<br>&nbsp; 976,823,060 bytes allocated in the heap<br>&nbsp; 572,439,796 bytes copied during GC (scavenged)<br>&nbsp;&nbsp;&nbsp; 3,426,580 bytes copied during GC (not scavenged)<br>&nbsp; 31,641,600 bytes maximum residency (21 sample(s))<br>


&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1864 collections in generation 0 (&nbsp; 2.76s)<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 21 collections in generation 1 (&nbsp; 1.76s)<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 84 Mb total memory in use<br>&nbsp;&nbsp;&nbsp; MUT&nbsp;&nbsp; time&nbsp;&nbsp;&nbsp; 3.70s&nbsp; (&nbsp; 3.79s elapsed)<br>&nbsp;&nbsp;&nbsp; GC&nbsp;&nbsp;&nbsp; time&nbsp;&nbsp;&nbsp; 4.52s&nbsp; (&nbsp; 4.58s elapsed)<br>


&nbsp;&nbsp;&nbsp; Total time&nbsp;&nbsp;&nbsp; 8.22s&nbsp; (&nbsp; 8.38s elapsed)<br>&nbsp;&nbsp;&nbsp; %GC time&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 55.0%&nbsp; (54.7% elapsed)<br>&nbsp;&nbsp;&nbsp; Productivity&nbsp; 45.0% of total user, 44.1% of total elapsed<br>&nbsp; real&nbsp;&nbsp;&nbsp; 0m8.391s<br>&nbsp; user&nbsp;&nbsp;&nbsp; 0m8.217s<br>&nbsp; sys&nbsp;&nbsp;&nbsp; 0m0.172s<br>

<br>

LAZY:<br>&nbsp; 740,632,056 bytes allocated in the heap<br>&nbsp; 147,902,632 bytes copied during GC (scavenged)<br>&nbsp;&nbsp;&nbsp; 3,316,016 bytes copied during GC (not scavenged)<br>&nbsp; 32,079,872 bytes maximum residency (7 sample(s))<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1412 collections in generation 0 (&nbsp; 0.60s)<br>


&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 7 collections in generation 1 (&nbsp; 0.42s)<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 66 Mb total memory in use<br>&nbsp;&nbsp;&nbsp; MUT&nbsp;&nbsp; time&nbsp;&nbsp;&nbsp; 1.96s&nbsp; (&nbsp; 2.02s elapsed)<br>&nbsp;&nbsp;&nbsp; GC&nbsp;&nbsp;&nbsp; time&nbsp;&nbsp;&nbsp; 1.02s&nbsp; (&nbsp; 1.09s elapsed)<br>&nbsp;&nbsp;&nbsp; Total time&nbsp;&nbsp;&nbsp; 2.98s&nbsp; (&nbsp; 3.10s elapsed)<br>


&nbsp;&nbsp;&nbsp; %GC time&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 34.2%&nbsp; (35.0% elapsed)<br>&nbsp;&nbsp;&nbsp; Productivity&nbsp; 65.8% of total user, 63.3% of total elapsed<br>&nbsp; real&nbsp;&nbsp;&nbsp; 0m3.114s<br>&nbsp; user&nbsp;&nbsp;&nbsp; 0m2.984s<br>&nbsp; sys&nbsp;&nbsp;&nbsp; 0m0.132s<br><br><br>This difference quickly disappears as the size of the query set increases, but I&#39;ve not been able to get the lazy version to go slower. <br>


In all use-cases that I&#39;ve tried, lazy IntMaps and Maps performed no worse than the current strict implementations. <br><br>The difference between lazy and strict tree constructors is less pronounced with Data.Map. I can&#39;t get it to make much of a difference.<br>


Using seq instead of bangs seems to speed things up a little, but I think this is spurious compiler behavior. Keeping the tree balanced<br>limits how much work you can put off, so really the only benefit for Data.Map is in functions like mapKeysMonotonic and mapWithKey, but I still think it would be useful to have lazy versions of those functions. The trie on which IntMap is based allows for a lot more laziness because its structure only changes along the path between the root and the point of operation. This is why lazy unioning works so well. This will become even more useful as we all start writing parallel code (which of course we all plan to start doing Real Soon Now) because &quot;foldr insert&quot; doesn&#39;t parallelize but &quot;unions . map singleton&quot; does.<br>


<br>The issue then is how to expose this laziness. I think the semantics (lazy or strict) of the currently exported functions should not be changed. Instead there should be lazy functions, either named like unionL, or put in a Lazy submodule. The submodule approach seems better to me but it would necessitate splitting up the library into more than 2 files. Data.Map.Base, Data.Map.Strict, Data.Map.Lazy, etc. The current interface would be rexported from Data.Map, unchanged.<br>


<br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">To be honest, I had no particular rational reason for wanting IntMap<br>

(and in fact I&#39;m not using it). &nbsp;I just heard some mumbling on the<br>

list about &quot;Data.Map so slow, IntMap so much faster especially for<br>

unions&quot;. &nbsp;Though to be honest, probably what most matters to me is how<br>

much garbage an insert generates (and similarly how much the resulting<br>

structures will share). &nbsp;I didn&#39;t run any tests or anything, so I<br>

wasn&#39;t that upset to not be able to use IntMap :)<br>

<div></div></blockquote><div><br>They are currently the best option for pure, persistent arrays. Better even than Data.Sequence. (I&#39;m told.) I use them a lot. You heard right about unioning. The above mentioned benchmark takes 18 seconds with a Map. They generate much less garbage because less of the tree is mutated after an insert/delete. <br>


&nbsp;</div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Points in time. &nbsp;I was using Int64 at the time, but I&#39;m actually using<br>

Doubles now. &nbsp;Plain Data.Map is working fine in practice.</blockquote><div><br>Yeah for that use-case I don&#39;t think you&#39;ll beat a good balanced binary tree, not in any language. A bit-trie won&#39;t really work because you could have duplicate entries (+/- 0) and they wouldn&#39;t be stored in sorted order.<br>


<br>Scott<br></div></div><br></div>

</div><br></div>