memory slop (was: Using the GHC heap profiler)

John Lato jwlato at gmail.com
Tue Mar 22 13:07:51 CET 2011


Hi Tim,

Sorry I can't tell you more about slop (I know less than you at this point),
but I do see the problem.  You're reading each line from a Handle as a
String (bad), then creating ByteStrings from that string with BS.pack
(really bad).  You want to read a ByteString (or Data.Text, or other compact
representation) directly from the handle without going through an
intervening string format.  Also, you'll be better off using a real parser
instead of "read", which is very difficult to use robustly.

John L.


> From: Tim Docker <twd2 at dockerz.net>
> Subject: memory slop (was: Using the GHC heap profiler)
> To: glasgow-haskell-users at haskell.org
> Message-ID: <4D895BB0.1080902 at dockerz.net>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
>
> On Mon, Mar 21, 2011 at 9:59 AM, I wrote:
> >
> > My question on the ghc heap profiler on stack overflow:
> >
> >
> http://stackoverflow.com/questions/5306717/how-should-i-interpret-the-output-of-the-ghc-heap-profiler
> >
> > remains unanswered :-( Perhaps that's not the best forum. Is there
> someone
> > here prepared to explain how the memory usage in the heap profiler
> relates
> > to the  "Live Bytes" count shown in the garbage collection statistics?
>
> I've made a little progress on this. I've simplified my program down to
> a simple executable that loads a bunch of data into an in-memory map,
> and then writes it out again. I've added calls to `seq` to ensure that
> laziness is not causing excessing memory consumption. When I run this on
> my sample data set, it takes ~7 cpu seconds, and uses ~120 MB of vm An
> equivalent python script, takes ~2 secs and ~19MB of vm :-(.
>
> The code is below. I'm mostly concerned with the memory usage rather
> than performance at this stage. What is interesting, is that when I turn
> on garbage collection statistics (+RTS -s), I see this:
>
>   10,089,324,996 bytes allocated in the heap
>      201,018,116 bytes copied during GC
>       12,153,592 bytes maximum residency (8 sample(s))
>       59,325,408 bytes maximum slop
>              114 MB total memory in use (1 MB lost due to fragmentation)
>
>   Generation 0: 19226 collections,     0 parallel,  1.59s,  1.64selapsed
>   Generation 1:     8 collections,     0 parallel,  0.04s,  0.04selapsed
>
>   INIT  time    0.00s  (  0.00s elapsed)
>   MUT   time    5.84s  (  5.96s elapsed)
>   GC    time    1.63s  (  1.68s elapsed)
>   EXIT  time    0.00s  (  0.00s elapsed)
>   Total time    7.47s  (  7.64s elapsed)
>
>   %GC time      21.8%  (22.0% elapsed)
>
>   Alloc rate    1,726,702,840 bytes per MUT second
>
>   Productivity  78.2% of total user, 76.5% of total elapsed
>
> This seems strange. The maximum residency of 12MB sounds about correct
> for my data. But what's with the 59MB of "slop"? According to the ghc docs:
>
> | The "bytes maximum slop" tells you the most space that is ever wasted
> | due to the way GHC allocates memory in blocks. Slop is memory at the
> | end of a block that was wasted. There's no way to control this; we
> | just like to see how much memory is being lost this way.
>
> There's this page also:
>
> http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/Slop
>
> but it doesn't really make things clearer for me.
>
> Is the slop number above likely to be a significant contribution to net
> memory usage? Are there any obvious reasons why the code below could be
> generating so much? The data file in question has 61k lines, and is <6MB
> in total.
>
> Thanks,
>
> Tim
>
> -------- Map2.hs --------------------------------------------
>
> module Main where
>
> import qualified Data.Map as Map
> import qualified Data.ByteString.Char8 as BS
> import System.Environment
> import System.IO
>
> type MyMap = Map.Map BS.ByteString BS.ByteString
>
> foldLines :: (a -> String -> a) -> a -> Handle -> IO a
> foldLines f a h = do
>     eof <- hIsEOF h
>     if eof
>       then (return a)
>       else do
>          l <- hGetLine h
>          let a' = f a l
>          a' `seq` foldLines f a' h
>
> undumpFile :: FilePath -> IO MyMap
> undumpFile path = do
>     h <- openFile path ReadMode
>     m <- foldLines addv Map.empty h
>     hClose h
>     return m
>   where
>     addv m "" = m
>     addv m s = let (k,v) = readKV s
>                in k `seq` v `seq` Map.insert k v m
>
>     readKV s = let (ks,vs) = read s in (BS.pack ks, BS.pack vs)
>
> dump :: [(BS.ByteString,BS.ByteString)] -> IO ()
> dump vs = mapM_ putV vs
>   where
>     putV (k,v) = putStrLn (show (BS.unpack k, BS.unpack v))
>
> main :: IO ()
> main =  do
>     args <- getArgs
>     case args of
>       [path] -> do
>           v <- undumpFile path
>           dump (Map.toList v)
>           return ()
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20110322/f48589fa/attachment.htm>


More information about the Glasgow-haskell-users mailing list