[Haskell-beginners] Diagnosing : Large memory usage + low CPU

Hugo Ferreira hmf at inescporto.pt
Mon Dec 5 14:57:58 CET 2011


Hello,

First and foremost thanks for the link Edward. I have read up your
stuff.

On 12/05/2011 06:29 AM, Edward Z. Yang wrote:
> Excerpts from Hugo Ferreira's message of Fri Dec 02 05:57:00 -0500 2011:
>> I have attached a profiling session (showing types).
>> I am surprised to see that the "[]" consumes so much data.
>> Where is this coming from? Need to analyse this more closely.
>
> For an -hT profile, what that actually means is your lists are using lots
> of memory.
>

Funny enough I cannot get this option to work. All the other -h options
work fine though.

>> Any idea how I can track what's generating all those "[]" ?
>> Note that the (,,) seems to be the NGramTag. data which is basically
>> used as a list (Zipper).
>
> For that, I recommend rebuilding with profiling and use the RTS flag -hc.
>

Ok, so I ran this and as follows:

time nice -n 19 ./postagger +RTS -p -hc -L50  &> tmp19.txt
hp2ps -e8in -c postagger.hp

Now I see that "rsplit_" seems to be the culprit for the initial peaks 
in memory use. However I also see in the profile that this function
seems to be responsible for only a small amount of memory generated.
Why such a big discrepancy between the live heap and the profile's total
memory?

Another question is, how can I cange the code below to avoid such
a peak? I already added ! to no avail.

rsplit :: Eq a => a -> [a] -> ([a], [a])
rsplit sep l = let (ps, xs, _) = rsplit_ sep l in
                (ps, xs)

rsplit_ :: Eq a => a -> [a] -> ([a], [a], Bool)
rsplit_ sep l = foldr (splitFun sep) ([], [], False) l
   where splitFun _ e !a@(!px, !xs, True) = (e:px, xs, True)
         splitFun sep e !a@(!px, !xs, False)
                  | e == sep = (px, xs, True)
                  | otherwise = (px, e:xs, False)

toTrainingInstance' :: String -> NGramTag
toTrainingInstance' s = let (token, tag) = rsplit '/' s in
                         (token, tag, "")

toTrainingCorpus s = let (token, tag) = rsplit '/' s in
                      (token, tag, "")


evalTaggers' _ = do
   h <- IO.openFile "brown-pos-train.txt" IO.ReadMode
   c <- IO.hGetContents h

   let train = toTrainingInstances $ map toTrainingInstance' $ words c
   ....
   i <- IO.openFile "brown-pos-test.txt" IO.ReadMode
   d <- IO.hGetContents i
   let test = Z.fromList $ map toTrainingCorpus $ words d
   ...

Anyone see an obvious change that needs to be made?

TIA,
Hugo F.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: postagger.ps
Type: application/postscript
Size: 111290 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/beginners/attachments/20111205/302bbea0/attachment-0001.ps>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: postagger.prof
URL: <http://www.haskell.org/pipermail/beginners/attachments/20111205/302bbea0/attachment-0001.asc>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: postagger.hp
Type: text/x-c++hdr
Size: 256304 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/beginners/attachments/20111205/302bbea0/attachment-0001.hpp>


More information about the Beginners mailing list