[Haskell-cafe] Data.IntMap union complexity

Clark Gaebel cgaebel at csclub.uwaterloo.ca
Fri Feb 24 04:22:36 CET 2012


The situation I encounted this is doing a batch update of a map. Is there
an easy way to do that? I'm doing something like adding 20-or-so elements
to an existing map of a few thousand.

On Thu, Feb 23, 2012 at 10:13 PM, wren ng thornton <wren at freegeek.org>wrote:

> On 2/23/12 9:16 PM, Clark Gaebel wrote:
>
>> Looking at IntMap's left-biased 'union' function [1], I noticed that the
>> complexity is O(n+m) where n is the size of the left map, and m is the
>> size
>> of the right map.
>>
>> Since insertion [2] is O(min(n, W)) [ where W is the number of bits in an
>> Int ], wouldn't it be more efficient to just fold 'insert' over one of the
>> lists for a complexity of O(m*min(n, W))? This would degrade into O(m) in
>> the worst case, as opposed to the current O(n+m).
>>
>
> The important things to bear in mind here are (1) the constant factors
> actually matter in practice, and (2) what's actually going on. While
> O(min(n,W)) is correct, it's incorrect to think about it as just a constant
> (or as just a linear function). While technically incorrect, it's better to
> think of it as O(log n) in order to get an intuition for how it works. And
> O(m+n) is much nicer than O(m*log n).
>
> Doing a fold with insert means that we must pay for the cost of traversing
> one of the maps entirely, and the cost of walking the spine for a
> lookup/insert m times. Whereas, with the merge function we only have to
> traverse the portions of the spines which intersect, and we only have to do
> it in one pass. In doing the fold, we're essentially ignoring the fact that
> the maps have a trie structure, since we have to traverse from the top for
> every insert; whereas for the merge, we make use of the structure in order
> to avoid redundant traversals of the top part of the structure.
>
> Thus, the merge is doing less work. So, in theory, it should be faster.
> However, again, the thing to beware of is the constant factors. In
> particular, big-O algorithmic analysis doesn't really account for things
> like locality and cache coherence, so one should always be on the lookout
> for places where duplicating work is actually faster in practice. If you're
> curious, you can always implement your own union using the fold-with-insert
> method and then run some benchmarks.
>
> --
> Live well,
> ~wren
>
> ______________________________**_________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/**mailman/listinfo/haskell-cafe<http://www.haskell.org/mailman/listinfo/haskell-cafe>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20120223/06d687b9/attachment.htm>


More information about the Haskell-Cafe mailing list