[Haskell-cafe] Re: String Hashing

apfelmus apfelmus at quantentunnel.de
Mon Jun 18 13:39:45 EDT 2007


Thomas Conway wrote:
> On 6/18/07, apfelmus <apfelmus at quantentunnel.de> wrote:
>> Do you need the hash function for a hash table or for
>> fingerprints/signatures? In the former case, Tries are a much better
>> choice. For launching your own trie, see also
> 
> I'm actually using them for bucket addressing for external indexing
> with a linear hash table. (Yes, the hashing does count, because
> buckets are cached in memory.)

> It always seems something of a shame that
> if you want all the benefits of lazy functional programming, you too
> often have to settle for O(n log n) data structures.

 Trie it is,
 not balanced tree.
 A logarithm in this
 would be new to me. :)

As a side node, Mr. Exp says: 64 is large enough for the size needs of
any logarithm.

> type HashTable k v = TVar (Array Int (TVar [(k,v)]))

Don't you want a TArray Int [(k,v)]?

In any case, you could be able to set up an infinite trie and have lazy
evaluation allocate space as needed:

 type Trie a = Branch (TVar a) (Trie a) (Trie a)

    -- an infinite tree with different TVars at every branch
 {-# NOINLINE tree -#}
 tree :: a -> Trie a
 tree x = Binary (unsafePerformIO $ newTVarIO x) (tree x) (tree x)

The intention is that the different threads concurrently evaluate the
suspension as far as they need to lookup/insert a key. The associated
value can be put into the fresh TVar they find there. The tree structure
itself is left untouched. Of course, it is imperative that all threads
see the same TVars. I don't know how thunk updates are handled in a
concurrent setting, but as they are write-once only and referentially
transparent, i see no major problem with them.

Regards,
apfelmus

PS: Hm, it's probably safer to code the lazy evaluation in the trie
manually. As a side effect, you are then able to garbage collect unused
but expanded parts of the trie from time to time.



More information about the Haskell-Cafe mailing list