This came up as I was doing homework for natural language processing.<br><br>I'm constructing a trigram model from training data, but I also need the bigram and unigram counts. I could, for each triple of characters, add the 3 tuple to a trigram map and increment its count (I know I'm not actually mutating anything), add the last two letters to a bigram map, and add the last character to a unigram map.<br>
<br>But this looks very much like a tree... each node contains the character (the key) and list of characters that can appear after it. (I think this is a trie, but I don't know very much about tries.) The problem is that lookup is more horribly inefficient than I can stand, if I use lists.<br>
<br>As for what constitutes convenient, I haven't thought about it that much; I just noted that my naive attempt was full of boilerplate and decided that since there was already python code provided I used that. I'm planning to port the code <i>after</i> I have the assignment finished.<br>
<br><div class="gmail_quote">On Fri, Oct 8, 2010 at 11:18 PM, wren ng thornton <span dir="ltr"><<a href="mailto:wren@freegeek.org">wren@freegeek.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div class="im">On 10/8/10 5:46 PM, Thomas DuBuisson wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
Alex,<br>
<br>
The containers library can do this already - there are no constraints<br>
on the elements of a Map. For example:<br>
<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
type TripleNestedMap a = Map Int (Map Char (Map String a))<br>
</blockquote>
<br>
But this is rather silly as you can just do:<br>
<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
type MapOfTriples a = Map (Int ,Char, String) a<br>
</blockquote>
<br>
for most uses.<br>
</blockquote>
<br></div>
However, Map is a lot less efficient than IntMap when dealing with Ints. And the IntMap (IntMap ... a) type requires you to write (lookup m <=< ... <=< lookup n) instead of just one lookup. Unfortunately, when you're interested in performance issues, the standard tricks for implementing polyvariadic functions aren't very useful.<br>
<br>
FWIW, the monadic combinators are usually sufficient to create your own functions legiblely (e.g., using (<=<) for lookup), but it's still a lot noiser than it could be--- especially if you want a trie instead of a product map, since you have to add fst and snd everywhere. I've been playing around with some ad-hoc tries like these a lot lately, both for HMM tagging and for hunting down performance issues in bytestring-trie.<br>
<br>
-- <br>
Live well,<br><font color="#888888">
~wren</font><div><div></div><div class="h5"><br>
_______________________________________________<br>
Haskell-Cafe mailing list<br>
<a href="mailto:Haskell-Cafe@haskell.org" target="_blank">Haskell-Cafe@haskell.org</a><br>
<a href="http://www.haskell.org/mailman/listinfo/haskell-cafe" target="_blank">http://www.haskell.org/mailman/listinfo/haskell-cafe</a><br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br><div dir="ltr"><div> Alex R</div></div><br>