[Haskell-cafe] Layered maps

Alex Rozenshteyn rpglover64 at gmail.com
Sat Oct 9 02:02:05 EDT 2010


This came up as I was doing homework for natural language processing.

I'm constructing a trigram model from training data, but I also need the
bigram and unigram counts.  I could, for each triple of characters, add the
3 tuple to a trigram map and increment its count (I know I'm not actually
mutating anything), add the last two letters to a bigram map, and add the
last character to a unigram map.

But this looks very much like a tree...  each node contains the character
(the key) and list of characters that can appear after it.  (I think this is
a trie, but I don't know very much about tries.)  The problem is that lookup
is more horribly inefficient than I can stand, if I use lists.

As for what constitutes convenient, I haven't thought about it that much; I
just noted that my naive attempt was full of boilerplate and decided that
since there was already python code provided I used that.  I'm planning to
port the code *after* I have the assignment finished.

On Fri, Oct 8, 2010 at 11:18 PM, wren ng thornton <wren at freegeek.org> wrote:

> On 10/8/10 5:46 PM, Thomas DuBuisson wrote:
>
>> Alex,
>>
>> The containers library can do this already - there are no constraints
>> on the elements of a Map.  For example:
>>
>>  type TripleNestedMap a = Map Int (Map Char (Map String a))
>>>
>>
>> But this is rather silly as you can just do:
>>
>>  type MapOfTriples a = Map (Int ,Char, String) a
>>>
>>
>> for most uses.
>>
>
> However, Map is a lot less efficient than IntMap when dealing with Ints.
> And the IntMap (IntMap ... a) type requires you to write (lookup m <=< ...
> <=< lookup n) instead of just one lookup. Unfortunately, when you're
> interested in performance issues, the standard tricks for implementing
> polyvariadic functions aren't very useful.
>
> FWIW, the monadic combinators are usually sufficient to create your own
> functions legiblely (e.g., using (<=<) for lookup), but it's still a lot
> noiser than it could be--- especially if you want a trie instead of a
> product map, since you have to add fst and snd everywhere. I've been playing
> around with some ad-hoc tries like these a lot lately, both for HMM tagging
> and for hunting down performance issues in bytestring-trie.
>
> --
> Live well,
> ~wren
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>



-- 
          Alex R
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20101009/3089d34a/attachment.html


More information about the Haskell-Cafe mailing list