[Haskell-cafe] memory, garbage collection and other newbie's issues

Wed Oct 18 14:44:38 EDT 2006

On 10/18/06, Andrea Rossato <mailing_list at istitutocolli.org> wrote:
> Hi!
>
> I'm a newbie and, as a learning experience, I'm writing a feed reader
> with hscurses and hxt. For the present time the feed reader just reads
> a Liferea cache but, as you can imagine, I'm running into the usual
> newbie problems of memory consumption and garbage collection, probably
> (I'm not sure) related to strictness/laziness.
>
> Even though I spent a couple of hours search the mailing list
> archives, I did not come up with something I can relate to, so I'll
> try to explain my problem.
>
> The feed reader, that should be compatible with Liferea, takes an opml
> (1.0) file, that stores information on folders and subscribed feeds.
> It uses it as the major component of the state of a ST monad, after
> adding some attributes used by the reader UI.
>
> The UI, that uses the widget library of hscurses and is derived from
> the Contact Manager example, will just display this opml file, and
> every UI event (collapsing/expanding of folders, displaying feeds,
> tagging, flagging, and so on) is just an XML transformation of this
> opml state component.
>
> So, when the feed reader boots, only the layout of folders and
> subscribed feeds is presented to the user.
>
> When the user selects a feed do be displayed, the cached file
> containing up to 100 saved posts, is read and transformed into a data
> type (called Feed, obviously). After that this data type is
> transformed into an opml (xml) tree, that is inserted as a child in
> the appropriate place of the opml state component.
>
> Moreover the parent element of the opml state component (which holds
> the original information of the subscribed feed) is edited for adding
> general feed information (such as last update, feed's attributes, and
> so on) retrieved by reading the file.
>
> When the user collapses the feed, the added opml chunk is deleted from
> the state component (but not the added information to the parent of
> this chunk).
>
> Now, I would expect that after the opml chunk is deleted all the
> memory allocated for reading the cached file would be garbage
> collected. This is not happening, so, every time you open (or reopen)
> a feed, the used memory of the feed reader increases, and never
> decreases.
>
> After profiling I've seen that the problem is occurring in the
> function that reads the cached file:
>
> loadFeed :: String -> IO [Feed]
> readFeed id =
>     do [a] <- runX $ readDocument [(a_validate, v_0)] (cachePath ++ id)
>        return $ runLA toFeed a
>
> What this function does is reading the file with:
> h <- openFile ...
> hGetContents h
> and applying some XML filters to get the Feed type populated with the
> needed information.
>
> I tried making the function strict with $!. I tried using fps. It
> doesn't change this behaviour, obviously.

Have you tried adding strictness annotations to your data type?  For
example, something like this:

data Foo a = Foo !a

If you do this for the datatypes you want to get deleted, my
understanding is that it will help.  I think this helps because the
garbage collector will have one less excuse for not cleaning up the
values since they will be forced to exist completely when they are
created instead of only existing partially (in which case, if I
understand correctly, the garbage collector doesn't throw away
partially constructed values).

You said you did profiling, have you done retainer profiling?  I
haven't used it myself, but I think that it is designed to help you
identify where the memory is being leaked.

I've been using haskell on and off for over a year now but still
consider myself to be mostly a newbie as well.  I started using
haskell at work recently on a project and I've found that although
haskell makes it so that I don't spend much time debugging (thanks
referential transparency!), testing (thanks quickcheck!), or writing
code (thanks haskell in general!), I do spend a lot of time profiling
and optimizing for time/space.  I think it's interesting that the
development is mostly quick and easy but polishing and making it ready
for general use can still be hard because of performance issues.
Although, I'd rather spend my time optimizing something that works
than debugging pointer problems.

HTH,
Jason