[Haskell-cafe] Re: Grouping - Map / Reduce

Gü?nther Schmidt gue.schmidt at web.de
Tue Mar 24 20:20:47 EDT 2009


Dear Luke,

I suspect Black Magic at work here.

This seems to work and I so don't have a clue why. But thank you very 
much nevertheless, I strongly suspect that, once I figured out why this 
works, I will have learned a very, very important trick indeed.

Had I read "purely functional data structures" from start to finish, 
would I have come across this?

Günther



Luke Palmer schrieb:
> On Tue, Mar 24, 2009 at 3:51 PM, Luke Palmer <lrpalmer at gmail.com 
> <mailto:lrpalmer at gmail.com>> wrote:
> 
>     On Tue, Mar 24, 2009 at 3:15 PM, Gü?nther Schmidt
>     <gue.schmidt at web.de <mailto:gue.schmidt at web.de>> wrote:
> 
>         Hi,
> 
>         let say I got an unordered lazy list of key/value pairs like
> 
>         [('a', 99), ('x', 42), ('a', 33) ... ]
> 
>         and I need to sum up all the values with the same keys.
> 
>         So far I wrote a naive implementation, using Data.Map, foldl and
>         insertWith..
> 
>         The result of this grouping operation, which is effectively
>         another list
>         of key/value pairs, just sums this time, needs to be further
>         processed.
> 
>         The building of this map is of course a bottleneck, the successive
>         processing needs to wait until the entire list is eventually
>         consumed
>         the Map is built and flattened again.
> 
>         Is there another way of doing this, something more "streaming
>         architecture" like?
> 
> 
>     Yeah, make a trie.  Here's a quick example.
> 
>     import Data.Monoid
> 
>     newtype IntTrie a = IntTrie [a]
> 
>     singleton :: (Monoid a) => Int -> a -> IntTrie a
>     singleton ch x = IntTrie [ if fromIntegral ch == i then x else
>     mempty | i <- [0..] ]
> 
> 
> This definition of singleton unnecessarily leaks memory in some cases.  
> Here's a better one:
> 
> singleton ch x = IntTrie $ replicate ch mempty ++ [x] ++ repeat mempty
> 
> Luke
> 
> 
> 
>     lookupTrie :: IntTrie a -> Int -> a
>     lookupTrie (IntTrie xs) n = xs !! n
> 
>     instance (Monoid a) => Monoid (IntTrie a) where
>         mempty = IntTrie (repeat mempty)
>         mappend (IntTrie xs) (IntTrie ys) = IntTrie (infZipWith mappend
>     xs ys)
> 
>     infZipWith f ~(x:xs) ~(y:ys) = f x y : infZipWith f xs ys
> 
>     test = mconcat [ singleton (n `mod` 42) [n] | n <- [0..] ]
>     `lookupTrie` 10
> 
>     This is an inefficient way to find the class of n such that n mod 42
>     = 10.  Note that it works on an infinite list of inputs.
> 
>     Here the "trie" was a simple list, but you could replace it with a
>     more advanced data structure for better performace.
> 
>     Luke
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe




More information about the Haskell-Cafe mailing list