[Haskell-cafe] Lessons from memory leak in streaming parser

Oren Ben-Kiki haskell-oren at ben-kiki.org
Sat Apr 21 18:30:53 EDT 2007


First, let me thank all the people who responded to my issue, it was
very helpful. I would have responded earlier but I was on a business
trip and out of contact for the last week.

On the bright side I used the time to re-work my implementation.
Instead of relying on Haskell's lazy evaluation, elegant as it may be,
I changed it to use a continuation passing style. Every invocation of
the parser returns a few tokens and a parser for the rest of the
input.

Interestingly, this approach maps well to non-lazy languages -
basically anything that supports closures - unlike the original method
that relied on lazy evaluation. So in principle I can now convert the
Haskell code to Scheme, Ruby, or even JavaScript.

At any rate, once I have done that, things started to fall into place.
I still had minor leaks, but the results of profiling were actually
proving useful for a change. I needed to turn off lazy evaluation in
several key places (almost all record fields are now strict, and a
'seq' was needed in one crucial location).

The result is a parser that can handle anything you throw at it with
constant (reasonably low) memory usage. It is dog-slow (about 8k/sec,
or twice that if compiling with -O2) but I was expecting that.

So, lesson learned - lazy evaluation in Haskell is a very nice
feature, but extremely difficult to debug, profile and control. Using
continuations may be less elegant at places, but is much more
practical.

You can see the final result in version 0.3 of the YamlReference
package I just uploaded to the Cabal database. This is intended to be
an "executable version" of the YAML specification (the BNF file, which
is actually Haskell code, is almost exactly the list of spec BNF
productions).

Thanks again for all the help,

    Oren Ben-Kiki


More information about the Haskell-Cafe mailing list