<div class="gmail_quote"><div>Hi Tim,</div><div><br></div><div>Sorry I can't tell you more about slop (I know less than you at this point), but I do see the problem. You're reading each line from a Handle as a String (bad), then creating ByteStrings from that string with BS.pack (really bad). You want to read a ByteString (or Data.Text, or other compact representation) directly from the handle without going through an intervening string format. Also, you'll be better off using a real parser instead of "read", which is very difficult to use robustly.</div>
<div><br></div><div>John L.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">From: Tim Docker <<a href="mailto:twd2@dockerz.net">twd2@dockerz.net</a>><br>
Subject: memory slop (was: Using the GHC heap profiler)<br>
To: <a href="mailto:glasgow-haskell-users@haskell.org">glasgow-haskell-users@haskell.org</a><br>
Message-ID: <<a href="mailto:4D895BB0.1080902@dockerz.net">4D895BB0.1080902@dockerz.net</a>><br>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed<br>
<br>
<br>
On Mon, Mar 21, 2011 at 9:59 AM, I wrote:<br>
><br>
> My question on the ghc heap profiler on stack overflow:<br>
><br>
> <a href="http://stackoverflow.com/questions/5306717/how-should-i-interpret-the-output-of-the-ghc-heap-profiler" target="_blank">http://stackoverflow.com/questions/5306717/how-should-i-interpret-the-output-of-the-ghc-heap-profiler</a><br>
><br>
> remains unanswered :-( Perhaps that's not the best forum. Is there someone<br>
> here prepared to explain how the memory usage in the heap profiler relates<br>
> to the "Live Bytes" count shown in the garbage collection statistics?<br>
<br>
I've made a little progress on this. I've simplified my program down to<br>
a simple executable that loads a bunch of data into an in-memory map,<br>
and then writes it out again. I've added calls to `seq` to ensure that<br>
laziness is not causing excessing memory consumption. When I run this on<br>
my sample data set, it takes ~7 cpu seconds, and uses ~120 MB of vm An<br>
equivalent python script, takes ~2 secs and ~19MB of vm :-(.<br>
<br>
The code is below. I'm mostly concerned with the memory usage rather<br>
than performance at this stage. What is interesting, is that when I turn<br>
on garbage collection statistics (+RTS -s), I see this:<br>
<br>
10,089,324,996 bytes allocated in the heap<br>
201,018,116 bytes copied during GC<br>
12,153,592 bytes maximum residency (8 sample(s))<br>
59,325,408 bytes maximum slop<br>
114 MB total memory in use (1 MB lost due to fragmentation)<br>
<br>
Generation 0: 19226 collections, 0 parallel, 1.59s, 1.64selapsed<br>
Generation 1: 8 collections, 0 parallel, 0.04s, 0.04selapsed<br>
<br>
INIT time 0.00s ( 0.00s elapsed)<br>
MUT time 5.84s ( 5.96s elapsed)<br>
GC time 1.63s ( 1.68s elapsed)<br>
EXIT time 0.00s ( 0.00s elapsed)<br>
Total time 7.47s ( 7.64s elapsed)<br>
<br>
%GC time 21.8% (22.0% elapsed)<br>
<br>
Alloc rate 1,726,702,840 bytes per MUT second<br>
<br>
Productivity 78.2% of total user, 76.5% of total elapsed<br>
<br>
This seems strange. The maximum residency of 12MB sounds about correct<br>
for my data. But what's with the 59MB of "slop"? According to the ghc docs:<br>
<br>
| The "bytes maximum slop" tells you the most space that is ever wasted<br>
| due to the way GHC allocates memory in blocks. Slop is memory at the<br>
| end of a block that was wasted. There's no way to control this; we<br>
| just like to see how much memory is being lost this way.<br>
<br>
There's this page also:<br>
<br>
<a href="http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/Slop" target="_blank">http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/Slop</a><br>
<br>
but it doesn't really make things clearer for me.<br>
<br>
Is the slop number above likely to be a significant contribution to net<br>
memory usage? Are there any obvious reasons why the code below could be<br>
generating so much? The data file in question has 61k lines, and is <6MB<br>
in total.<br>
<br>
Thanks,<br>
<br>
Tim<br>
<br>
-------- Map2.hs --------------------------------------------<br>
<br>
module Main where<br>
<br>
import qualified Data.Map as Map<br>
import qualified Data.ByteString.Char8 as BS<br>
import System.Environment<br>
import System.IO<br>
<br>
type MyMap = Map.Map BS.ByteString BS.ByteString<br>
<br>
foldLines :: (a -> String -> a) -> a -> Handle -> IO a<br>
foldLines f a h = do<br>
eof <- hIsEOF h<br>
if eof<br>
then (return a)<br>
else do<br>
l <- hGetLine h<br>
let a' = f a l<br>
a' `seq` foldLines f a' h<br>
<br>
undumpFile :: FilePath -> IO MyMap<br>
undumpFile path = do<br>
h <- openFile path ReadMode<br>
m <- foldLines addv Map.empty h<br>
hClose h<br>
return m<br>
where<br>
addv m "" = m<br>
addv m s = let (k,v) = readKV s<br>
in k `seq` v `seq` Map.insert k v m<br>
<br>
readKV s = let (ks,vs) = read s in (BS.pack ks, BS.pack vs)<br>
<br>
dump :: [(BS.ByteString,BS.ByteString)] -> IO ()<br>
dump vs = mapM_ putV vs<br>
where<br>
putV (k,v) = putStrLn (show (BS.unpack k, BS.unpack v))<br>
<br>
main :: IO ()<br>
main = do<br>
args <- getArgs<br>
case args of<br>
[path] -> do<br>
v <- undumpFile path<br>
dump (Map.toList v)<br>
return ()<br></blockquote></div>