[Haskell-cafe] Pure hashtable library

Wed Aug 27 16:47:38 EDT 2008

Hello Jules,

Wednesday, August 27, 2008, 7:59:55 PM, you wrote:

>>> To get "much better efficient" than a trie, the hash function has to be
>>> so fast that it is faster than following (log n) pointers
>> 
>> afaiu, trie search follows n pointers 

> No.

> "n" is the number of strings in my data set (dictionary).

> If I have "n" strings the average string length is asymptotically, in 
> some sense, "log n". Of course for particular data sets it's may be more .

> But "log n" is the length of the shortest unique coding, it's also the
> number of characters you typically need to traverse before you have 
> reached a unique prefix, at which point your trie can short-circuit.

> I appreciate that I didn't define my terminology ;) You might have a 
> different "n".

> I repeat my challenge "Prove it". I will be interested to see how much a
> good immutable hash outperforms Data.Map.

> I would then also be interested to see how much it outperforms a decent
> Data.Map (such as your own AVL one) and a decent trie.

1) i don't have time/interest to do it, so i just pushed the idea

2) what you have proved there is that for set of n randomly chosen
strings trie lookup need O(log n) time - the same is true for hash

3) practical performance of trie will suffer by the need to follow
several trie levels each providing several elements to check. OTOH
hash lookup will usually need only 1-2 checks, including one check on
full string size

4) using ByteString for hash indexes would make lookups much faster

-- 
Best regards,
 Bulat                            mailto:Bulat.Ziganshin at gmail.com