[Haskell-beginners] Unexpected Space Leak (despite using external-sort)

Stephen Blackheath [to Haskell-Beginners] mutilating.cauliflowers.stephen at blacksapphire.com
Mon Jan 4 00:41:56 EST 2010


Peter,

See below...
> Thanks Steve. I think I understand your point. It seems like
> external-sort needs patching to make block comparison more efficient for
> everyone and that my case is just particularly extreme/degenerate. Just
> right now, I'm not yet confident enough about my Haskell abilities to go
> and create a new type for 'block' with more efficient Ord and then
> modify the rest of the external-sort code accordingly. Posted program
> was the 2nd Haskell prog I've every written. I'll get there in time,
> though :).

Laziness and space analysis make up a significant part of Haskell's
learning curve, I believe, so you're doing well.

Here's how I'm suggesting you change kMerge

> kMerge :: (Ord a) => [[a]] -> [a]
> kMerge [] = []
> kMerge l =
>     let h = Splay.fromSeq l in
>     kM (Splay.minElem h) (Splay.deleteMin h)
>     where
>     kM :: (Ord a) => [a] -> Splay.Heap [a] -> [a]
>     kM l h
>         | h == Splay.empty = l
>         | otherwise =
>             let next = Splay.minElem h
>                 (f, b) = span (\x -> x <= head next) l
>             in
>             f ++ (kM next (if null b then Splay.deleteMin h
>                            else (Splay.insert b $ Splay.deleteMin h)))

becomes something like... (with lots of wrapping and unwrapping)...

> newtype HeadCompList a = HeadCompList [a]
>
> instance Ord a => Ord (HeadCompList a) where
>     HeadCompList (x:_) `compare` HeadCompList (y:_) = x `compare` y
-- compare head elts only
>
> kMerge :: (Ord a) => [[a]] -> [a]
> kMerge [] = []
> kMerge l =
>     let wrappedL = map HeadCompList l
>          h = Splay.fromSeq wrappedL in
>     kM (Splay.minElem h) (Splay.deleteMin h)
>     where
>     kM :: (Ord a) => HeadCompList a -> Splay.Heap (HeadCompList a) -> [a]
>     kM (HeadCompList l) h
>         | h == Splay.empty = l
>         | otherwise =
>             let (HeadCompList next) = Splay.minElem h
>                 (f, b) = span (\x -> x <= head next) l
>             in
>             f ++ (kM next (if null b then Splay.deleteMin h
>                            else (Splay.insert (HeadCompList b) $
Splay.deleteMin h)))

> I wonder if there is another way of making my code work better with the
> existing externalSort function:
> 
> I am sorting [[Int]], an example of which might be:
> 
> [[8, 11, 7, 10],
> [8, 11, 13, 10],
> [12, 11, 7, 10],
> [12, 11, 13, 10],
> [1, 1, 3, 4],
> .
> .
> [3, 9, 6, 4]]
> 
> I need numerical lex sorting, so I can't just map directly to ByteString
> as this would result in "12" collating before "1", etc.
> 
> I know that each sublist I am sorting never contains an integer greater
> than 20. I could encode each sublist as a ByteString where 1 maps to a,
> 2 maps to B, 3, maps to C, etc.
> 
> So would have [1,1,3,4] -> "aacd" (ByteString), etc.
> 
> Hopefully Ord comparisons between ByteStrings are much less expensive
> than comparisons between instances of  [Int] and I will see nicer
> behaviour since external-sort does work as advertised on lists of
> millions of Ints?

Well, if I'm right, then the problem is that kMerge is evaluating too
much of each of its ... in your case .. [[Int]] values.  You're
proposing to change it to [ByteString].  Again, if I'm right, that would
improve it by exactly the factor of space usage by ByteString / [Int],
without solving the fundamental problem.

> So I could sort these, then map back to underlying Int data when I
> output. Schwartzian transform type stuff.
> 
> Anyway, I'll give it a go.

I may be wrong!


Steve


More information about the Beginners mailing list