[Haskell-beginners] Diagnosing : Large memory usage + low CPU

Hugo Ferreira hmf at inescporto.pt
Fri Dec 2 11:57:00 CET 2011


Hi Edward,

On 12/01/2011 07:55 AM, Edward Z. Yang wrote:
> Hello Hugo,
>
> Can you do a heap profile (+RTS -hT, or maybe use one of the other
> options if you've got a profiling copy lying around)?

I have attached a profiling session (showing types).
I am surprised to see that the "[]" consumes so much data.
Where is this coming from? Need to analyse this more closely.

> Try using
> smaller data if it's taking too long; usually the profile will still
> look the same, unless it's a particular type of input that is triggering
> bad behavior.
>

The case above is for test data that is about 1/5 of the original data.

> There is not enough detail in your code for me to use my psychic
> debugging skills, unfortunately.
>

I have very little knowledge of Haskell in order to interpret this
stuff correctly, even so I think we still need your "psychic
debugging skills" B-)

Any idea how I can track what's generating all those "[]" ?
Note that the (,,) seems to be the NGramTag. data which is basically
used as a list (Zipper).

regards,
Hugo F.

> Edward
>
> Excerpts from Hugo Ferreira's message of Wed Nov 30 09:23:53 -0500 2011:
>> Hello,
>>
>> On 11/29/2011 10:57 PM, Stephen Tetley wrote:
>>> Hi Hugo
>>>
>>> What is a POSTags and how big do you expect it to be?
>>>
>>>
>>
>> type Token = String
>> type Tag = String
>>
>> type NGramTag = (Token, Tag, Tag)
>>
>> type POSTags = Z.Zipper NGramTag
>>
>>> Generally I'd recommend you first try to calculate the size of your
>>> data rather than try to strictify things, see Johan Tibell's very
>>> useful posts:
>>>
>>>
>>> http://blog.johantibell.com/2011/06/memory-footprints-of-some-common-data.html
>>> http://blog.johantibell.com/2011/06/computing-size-of-hashmap.html
>>>
>>
>> According to size in String I am expecting a maximum of 50 Mega.
>> Profiling (after a painful 80 minutes) shows:
>>
>> total alloc = 20,350,382,592 bytes
>>
>> Way too much.
>>
>>> Once you know the size of your data - you can decide if it is too big
>>> to comfortably work with in memory. If it is too big you need to make
>>> sure you're are streaming[*] it rather than forcing it into memory.
>>>
>>> If POSTags is large, I'd be very concerned about the top line of
>>> updateState - reversing lists (or sorting them) simply doesn't play
>>> well with streaming.
>>>
>>
>> The zipper does quite a bit of reversing and appending.
>> I also need to reverse lists to retain the order of the
>> characters (text). I also do sorting but I have eliminated this
>> in the tests.
>>
>> So my question: how can one "force" the reversing and append?
>> Anyone?
>>
>> TIA,
>> Hugo F.
>>
>>>
>>> [*] Even in a lazy language like Haskell, streaming data isn't
>>> necessarily automatic.
>>>
>>> _______________________________________________
>>> Beginners mailing list
>>> Beginners at haskell.org
>>> http://www.haskell.org/mailman/listinfo/beginners
>>>
>>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: postagger2.ps
Type: application/postscript
Size: 60160 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/beginners/attachments/20111202/0b1a9dc8/attachment-0001.ps>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: postagger2.prof
URL: <http://www.haskell.org/pipermail/beginners/attachments/20111202/0b1a9dc8/attachment-0001.asc>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: postagger2.hp
Type: text/x-c++hdr
Size: 85660 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/beginners/attachments/20111202/0b1a9dc8/attachment-0001.hpp>


More information about the Beginners mailing list