[Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

Max Bolingbroke batterseapower at hotmail.com
Wed Feb 23 09:45:47 CET 2011


On 23 February 2011 05:31, Johan Tibell <johan.tibell at gmail.com> wrote:
> On Tue, Feb 22, 2011 at 9:19 PM, Johan Tibell <johan.tibell at gmail.com> wrote:
>> Initial numbers suggest that lookup gets 3% slower and insert/delete
>> 6% slower. The upside is O(1) size.
>
> Can someone come up with a real world example where O(1) size is important?

I'm a bit sceptical that it is (I was not convinced by the earlier
strict-set-inclusion argument, since that's another Data.Map feature
I've never used). I thought of some other possibilities though:
 1. If copying an unordered-collection to a flat array you can improve
the constant factors (not the asymptotics) with O(1) size to
pre-allocate the array
 2. If building a map in a fixed point loop (I personally do this a
lot) where you know that the key uniquely determines the element, you
can test if a fixed point is reached in O(1) by just comparing the
sizes. Depending what you are taking a fixed point of, this may change
the asymptotics
 3. Some map combining algorithms work more efficiently when one of
their two arguments are smaller. For example, Data.Map.union is most
efficient for (bigmap `union` smallmap). If you don't care about which
of the two input maps wins when they contain the same keys, you can
improve constant factors by testing the size of the map input to size
(in O(1)) and flipping the arguments if you got (smallmap `union`
bigmap) instead of the desirable way round.

Personally I don't find any of these *particularly* compelling. But a
~6% slowdown for this functionality is not too bad - have you had a
chance to look at the core to see if the cause of the slowdown
manifests itself at that level? Perhaps it is possible to tweak the
code to make this cheaper.

Also, what was the size of the collections you used in your benchmark?
I would expect the relative cost of maintaining the size to get lower
as you increased the size of the collection.

Max



More information about the Haskell-Cafe mailing list