[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

wren ng thornton wren at freegeek.org
Tue Jun 15 16:18:16 EDT 2010


braver wrote:
> On Jun 14, 11:40 am, Don Stewart <d... at galois.com> wrote:
>> Oh, you'll want insertWith'.
>>
>> You might also consider bytestring-trie for the Graph, and IntMap for
>> the AdJList ?
> 
> Yeah, I saw jsonb using Trie and thought there's a reason for it.  But
> it's very API-poor compared with Map, e.g. there's not even a fold --
> should one toListBy first?

I find that surprising. Have you looked in Data.Trie.Convenience? The 
API of Data.Map is rather bloated so I've pushed most of it out of the 
main module in order to clean things up. There are only a small number 
of functions in the Data.Map interface I haven't had a chance to 
implement yet.

For folding, the `foldMap`, `foldr`, and `foldl` functions are provided 
via the Data.Foldable interface. The Data.Traversable class is also 
implemented if you need to make changes to the trie along the way. These 
all give generic folding over the values stored in the trie. If you need 
access to the keys during folding you can use `foldrWithKey`, though it 
has to reconstruct the keys, which doesn't sound good for your use case. 
`toListBy` is a convenience wrapper around `foldrWithKey` which supports 
list fusion, so it has the same advantages and disadvantages compered to 
the Foldable/Traversable functions.

If there's a particular function you still need, let me know and I can 
add an implementation for it.



In terms of optimizing your code, one thing you'll surely want to do is 
to construct an intern table (Trie Int, IntMap ByteString) so that you 
only have to deal with Ints internally rather than ByteStrings. I 
haven't looked at your code yet to see how this would fit in, but it's 
almost always a requisite trick for handling large text corpora.

-- 
Live well,
~wren


More information about the Haskell-Cafe mailing list