[Haskell-cafe] Re: Implementing "unionAll"

Heinrich Apfelmus apfelmus at quantentunnel.de
Tue Feb 16 12:21:53 EST 2010


Leon Smith wrote:
> With the urging and assistance of Omar Antolín Camarena,  I will be
> adding two functions to data-ordlist:  mergeAll and unionAll,  which
> merge (or union)  a potentially infinite list of potentially infinite
> ordered lists,   under the assumption that the heads of the non-empty
> lists appear in a non-decreasing sequence.
> 
> However,  as Omar pointed out to me,  the following implementation of
> unionAll has a flaw:
> 
>> unionAll :: Ord a => [[a]] -> [a]
>> unionAll = foldr (\(x:xs) ys -> x : union xs ys) []
> 
> Namely unionAll [[1,2],[1,2]] should return [1,2],  whereas it
> actually returns [1,1,2].   After some work,  I believe I have
> generalized H. Apfelmus's algorithm to handle this;  however it seems
> a bit complicated.   I would love feedback,  especially with regard to
> simplifications,  bugs,  testing strategies,  and optimizations:
> 
>> unionAll' :: Ord a => [[a]] -> [a]
>> unionAll' = unionAllBy compare
> 
>> data People a = VIP a (People a) | Crowd [a]
> 
>> unionAllBy :: (a -> a -> Ordering) -> [[a]] -> [a]
>> unionAllBy cmp xss = loop [ (VIP x (Crowd xs)) | (x:xs) <- xss ]
>>   where
>>     loop [] = []
>>     loop (  VIP x xs  :  VIP y ys  :  xss  )
>>       = case cmp x y of
>>           LT -> x : loop (  xs  :  VIP y ys  :  xss  )
>>           EQ ->     loop (  VIP x (union' xs ys)  :  unionPairs xss  )
>>           GT -> error "Data.List.Ordered.unionAll:  assumption violated!"
>>     loop (  VIP x xs  :  xss  )
>>       =  x : loop (xs:xss)
>>     loop [Crowd xs] = xs
>>     loop (xs:xss) = loop (unionPairs (xs:xss))
>>
>>     unionPairs [] = []
>>     unionPairs [x] = [x]
>>     unionPairs (x:y:zs) = union' x y : unionPairs zs
>>
>>     union' (VIP x xs) (VIP y ys)
>>        = case cmp x y of
>>            LT -> VIP x (union' xs (VIP y ys))
>>            EQ -> VIP x (union' xs ys)
>>            GT -> error "Data.List.Ordered.unionAll:  assumption violated!"
>>     union' (VIP x xs) (Crowd ys) = VIP x (union' xs (Crowd ys))
>>     union' (Crowd []) ys = ys
>>     union' (Crowd xs) (Crowd ys) = Crowd (unionBy cmp xs ys)
>>     union' xs@(Crowd (x:xt)) ys@(VIP y yt)
>>        = case cmp x y of
>>            LT -> VIP x (union' (Crowd xt) ys)
>>            EQ -> VIP x (union' (Crowd xt) yt)
>>            GT -> VIP y (union' xs yt)

I see no obvious deficiencies. :) Personally, I'd probably structure it like

   http://www.haskell.org/haskellwiki/Prime_numbers#Implicit_Heap

so that your code becomes

   unionAll = serve . foldTree union' . map vip

Your  loop  function is a strange melange of many different concerns
(building a tree, union', adding and removing the VIP constructors).


Note that it's currently unclear to me whether the lazy pattern match in

   pairs ~(x: ~(y:ys)) = f x y : pairs ys

is beneficial or not; you used a strict one

   unionPairs (x:y:zs) = union' x y : unionPairs zs

Daniel Fischer's experiments suggest that the strict one is better

   http://www.mail-archive.com/haskell-cafe@haskell.org/msg69807.html

If you're really concerned about time & space usage, it might even be
worth to abandon the lazy tree altogether and use a heap to achieve the
same effect, similar to Melissa O'Neils prime number code. It's not as
"neat", but much more predictable. :)


Regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com



More information about the Haskell-Cafe mailing list