[Haskell-cafe] Space leak with Data.Binary and decodeFile

Maxime Henrion mhenrion at gmail.com
Sun Jan 11 17:00:09 EST 2009

    Hello all,

I've been observing a huge space leak with some code using Data.Binary
that I cannot make sense of and I hope someone here can shed some light
on this, so I'll try to explain my problem as clearly as possible.  I
qualify the space leak as huge because if I let the program run, it will
soon consume the whole memory available (~3G) and finally will get
killed by the system.

The code I'm writing implements a search algorithm using an inverted
index.  This index is built from a Trie [Int] (from the bytestring-trie
package) and an Array Int ByteString.  The trie maps each referenced
word to an integer list that is a list of indices into the array.  Here
is the code for the Index datatype and the obvious Binary instance:

data Index = Index { entries :: Array Int ByteString
                   , invidx  :: Trie [Int

instance Binary Index where
    put (Index dirs idx) = put dirs >> put idx
    get = liftM2 get get

I have no problems creating and seralizing this data structure to a
file.  The huge leak appears instead when I'm reading this data
structure from a file and try to do something with it.

This is the smallest test case I came up with that can reproduce the
problem :

main = do idx <- decodeFile "list.idx"; mapM_ (B.putStrLn . snd) (assocs
(entries idx))

The space leak also appears when I try to touch the trie instead of the
array.  I've been trying tons of combinations involving adding or
removing strictness annotations and seq calls in various places with no
luck.  I have also been adding SCC annotations and tried to profile the
code.  This seemed to suggest the space leak happens in the get method
of the Array instance of Binary :

instance (Binary i, Ix i, Binary e) => Binary (Array i e) where
    get = do
        bs <- get
        n  <- get                  -- read the length
        xs <- replicateM n get     -- now the elems.
        return (listArray bs xs)

The output of the profiler tells me that all the space gets allocated
from the "replicateM n get" expression.

Now for the really weird part: if I load my code in GHCi and type
"main", I can observe the space leak.  However, if I copy paste the
definition of main instead, the code runs fine!  This is the only
circumstance I've seen this code work instead of eating all the RAM...

I have been using GHC 6.10.1, binary 0.4.4 and bytestring-trie 0.1.2.

If there's anything else that I can do to understand what's going on, I
would gladfully hear about it.  Please also tell me if I should provide
more information.

Maxime Henrion

More information about the Haskell-Cafe mailing list