[Haskell-cafe] Optimization problem

Wed Sep 13 19:10:59 EDT 2006

Dear Haskell Cafe,

When programming the other day I ran into this problem. What I want to do 
is a function that would work like this:

splitStreams::Ord a=>[(a,b)]->[(a,[b])]

> splitStreams [(3,x),(1,y),(3,z),(2,w)]
[(3,[x,z]),(1,[y]),(2,[w])]

I don't care about the order that the pairs are output, so the answer 
could just as well be [(2,[w],(3,[x,z]),(1,[y])]. However I do care about 
the order of the xyzw:s, so (3,[z,x]) could not be part of the solution in 
this example.

Furthermore it should work on infinite lists. It can't eat the whole 
list before producing any output.

Now, it's not too hard to come up with an inefficient solution that 
traverses the input list multiple times. For example a sieving solution:

import Data.List

splitStreams [] = []
splitStreams ((channel,msg):rest) =
     let (myMsgs,otherMsgs) =
           partition (\(c,_)->c==channel) rest
     in (channel, msg : map snd myMsgs) : splitStreams otherMsgs

I'm afraid this algorithm is O(n*m) time in the worst case, where n is the 
length of the input list, and m is the number of unique channels.

But is there any way to write it such that each element is touched only 
once? Or at least an O(n*log(m)) algorithm?

Any takers?

/ Magnus Jonsson