RFC: Adding a Hashable type class and HashMap/HashSet data types to HP

Johan Tibell johan.tibell at gmail.com
Thu Nov 18 17:05:57 EST 2010


Hi all,

This is a request for early feedback on something that I'd like to
propose for the next major HP release.

Milan Straka showed [1] that it's possible to design a persistent set
(and thus also map) for string-like keys that's about twice as fast as
the current Data.Set using hashing and a Patricia tree (i.e. IntMap).
The Clojure community has also managed to create a fast persistent
map, based on Bagwell's hash array mapped trie.

I'd like to propose that we get the necessary infrastructure in place
for creating fast and easy to use hash-based data structure. Here's
what I think is needed:

* A type class for things that can be hashed, Hashable.

class Hashable a where
    hash :: a -> Word

An alternative would be to have `hash` return an Int, which would make
the method easier to use with Haskell data structures, which are
indexed using Ints, but Word feels like the more correct type to me.

The Hashable type class would have to go in either base, containers,
or a new package (there's currently a hashable package on Hackage.)

* Instances of Hashable for common types, in particular ByteString and Text.

I already have fast instances for Text and ByteString, based on MurmurHash2 [2].

* Two new data types in the containers package, Data.HashMap and
Data.HashSet. These would be persistent data structures, initially
based on IntMap, and have the same or similar API to Data.Map and
Data.Set..

In the future we could also add mutable HashTables, operating in the
ST monad, or other hash-based data types (persistent or mutable).

Any comments?

Johan

1. http://research.microsoft.com/~simonpj/papers/containers/containers.pdf
2. http://sites.google.com/site/murmurhash/


More information about the Libraries mailing list