[Haskell-cafe] Can't find space leak in external sort with tournament trees (test case attached)

Denis Bueno dbueno at gmail.com
Tue Apr 21 23:12:45 EDT 2009


Hello haskell-cafe,

I'm running into what I think is a space leak.  I've modified
external-sort-0.2 [0] to use tournament trees for merging, and
although the block sorting works in constant space, merging seems to
produce a space leak.

The external sort works by lazily consuming an input list, chopping it
up into fixed size blocks, sorting each block in memory and writing
each sorted block out to disk.  At this point we have a big, on-disk
array of k sorted blocks.  To produce the final, sorted output list,
we lazily read back each block into a tournament tree (a heap of
sorts), which should only compare the first element of block to figure
out the tree root.  The merging algorithm in full is in `tourMerge'
[1].  I've confirmed by profiling (with -hc) that the merge step
gobbles up memory.

I've attached a self-contained program (two haskell source files).  It
requires binary, mersenne-random*, and ArrayRef from hackage.

    $ tar xvzf leak.tar.gz
    leak.hs
    Data/TournamentTree.hs
    $ ghc -O --make leak.hs
    $ ./leak 5000000 # generate 5 million entries in ./entries.tmp,
sort them, and output to ./leak.out
    Writing 5000000 entries to './entries.tmp'
    Done.  Henceforth, if you omit the number argument to leak it will
sort './entries.tmp'.
    Sorting 5000000 entries from './entries.tmp'
    External block sort using 1000 blocks:
      -- 5000 entries per block
      -- 240000 bytes per block
    1000--blocks sorted.
    Merging blocks and writing sorted entries to './leak.out'

Depending on how much RAM you have, you may need to tweak the argument
to leak and/or numBlocks (at the top of leak.hs) -- a block needs to
be able to fit in RAM.  If the argument to leak is too low, the sort
will succeed and the leak won't be apparent.  Otherwise, the OS will
kill the process because it allocated too much.

Anyone have an idea how I can fix the leak?
                              Denis



*Only required to generate the initial input file to sort; it's much
faster than System.Random.

[0] http://hackage.haskell.org/cgi-bin/hackage-scripts/package/external-sort-0.2

[1]
tourMerge is in leak.hs, but I've also inlined it here for
convenience.  TT is the Data.TournamentTree module, included in the
tarball I've attached.

tourMerge :: (Ord a) => [[a]] -> [a]
tourMerge []   = []
tourMerge blks = let t = {-SCC "TMfromList" #-}
                         TT.fromList blks
                 in tM (TT.minElem t) (TT.deleteMinF t)
  where
  tM :: (Ord a) => [a] -> TT.Forest [a] -> [a]
  tM l f
     | null f = l
     | otherwise =
         let t      = TT.tournament f
             next   = TT.minElem t
             (x, xs) = {-# SCC "TTspan" #-} span (<= head next) l
         in
         x ++ (tM next
                  (if null xs then TT.deleteMinF t
                   else (TT.singleton xs : TT.deleteMinF t)))
-------------- next part --------------
A non-text attachment was scrubbed...
Name: leak.tar.gz
Type: application/x-gzip
Size: 4804 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/haskell-cafe/attachments/20090421/96f49dda/leak.tar.bin


More information about the Haskell-Cafe mailing list