<br><br><div class="gmail_quote">On Tue, Feb 21, 2012 at 3:58 PM, Ben Gamari <span dir="ltr"><<a href="mailto:bgamari.foss@gmail.com">bgamari.foss@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">On Tue, 21 Feb 2012 11:25:05 -0800, Johan Tibell <<a href="mailto:johan.tibell@gmail.com">johan.tibell@gmail.com</a>> wrote:<br>
> On Tue, Feb 21, 2012 at 11:20 AM, Ben Gamari <<a href="mailto:bgamari.foss@gmail.com">bgamari.foss@gmail.com</a>> wrote:<br>
> > This is certainly an option. It's actually been suggested that I use<br>
> > HashMaps instead of IntMaps to get more uniform use of the<br>
> > key-space. I have been a little worried that the cost of hashing would<br>
> > outweigh the benefit of using benefits of more shallow trees, but I<br>
> > suppose memory accesses are expensive. That being said, I could<br>
> > certainly just use the Enum instance.<br>
><br>
> In the case of Ints and newtypes thereof hashing is very cheap. A<br>
> no-op or a multiplication with a large prime.<br>
><br>
</div>Sure. In my application (machine learning) we were mapping from tokens<br>
(short strings, generally less than 15 bytes) to sequential newtype'd<br>
Ints (newtype Token = Token Int). However, the fact that these<br>
identifiers were sequential meant that IntMap's trees would be close to<br>
the worst-case log(N) depth.</blockquote><div><br></div><div>Either way, the tries only contain levels on which keys differ, so tree depths shouldn't vary by more than 1 or so.</div><div><br></div><div>HashMap uses lsb-first splitting in its current incarnation, so that you end up balancing contiguous keys across the trie. This is the main argument in favor of low-order-first splitting. The main argument against is that high-order-first splitting yields a standard binary search tree for lookup operations if you encode the pivots appropriately.</div>
<div><br></div><div>I don't recall which method is used by Johan's wide-fanout tries. I seem to recall that at wider fanouts you're happier (for GC reasons) churning the right spine in response to contiguous insertions, rather than churning the entire tree.</div>
<div><br></div><div>The takeaway: if you're seeing performance problems from contiguous insertion, it might be worth comparing with HashMap.</div><div><br></div><div>-Jan-Willem Maessen</div><div><br></div></div>