[Haskell-cafe] ANN: TernaryTrees-0.1.1.1 - An efficient ternary tree implementation of Sets and Maps

wren ng thornton wren at freegeek.org
Thu Jul 2 20:59:01 EDT 2009


Don Stewart wrote:
> wren:
>> Alex Mason wrote:
>>> TernaryTrees is a package that extends Data.Set ad Data.Map with some  
>>> ternary tree structures, based on the article  
>>> [http://www.pcplus.co.uk/node/3074/] .
>> For the string (or rather ByteString) version:
>>
>> http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bytestring-trie
>>
>> Which has a number of other significant performance improvements (e.g.  
>> node fusion, ByteString instead of String) and a highly expressive  
>> interface. Because it uses ByteStrings it can trie any type which can be  
>> serialized into a vector of bits[1], albeit indirectly.
>>
>> The real trick with tries is not in just having them[2], it's in having  
>> the right interface to make use of what they're good at. For example, if  
>> I have multiple tries, I'd like to merge them without doing it element  
>> by element[3]. Or if I know I'm going to be making a number of similar  
>> queries, it'd be nice if I could cache my position in the trie[4] to  
>> avoid repeating the work for the prefixes of all my queries[5]. Using  
>> tricks like these leads to significant improvements over using them like  
>> hashtables; tries aren't hashtables just like lists aren't arrays.
> 
> Do you have benchmarks?


Somewhere in my email archive (care of Mark Wotton). I'll see if I can 
dig them up this weekend. The biggest issue here is finding nice 
datasets (and tasks) to give reasonable benchmarks for. Reading in all 
of /usr/dict (or the Brown corpus) and looking up all keys only gives 
one perspective (or two), and not necessarily the most helpful one for 
"real world" use. I haven't found any good dataset/task suites like 
there are for the Language Benchmarks Game, though I'd love to hear 
about one.

The tries /= hashtables comment stems from discussions on various 
haskell blogs with people inventing their own (or wanting to benchmark 
Data.Map vs hashtables vs tries vs bloomfilters). As a drop-in 
replacement tries will perform adequately, but they're nothing 
overwhelming; the overwhelming comes from changing the usage algorithms 
to match the "stride" of the datastructure. I don't think I have links 
to these discussions anymore to pull up code examples.

-- 
Live well,
~wren


More information about the Haskell-Cafe mailing list