[Haskell-cafe] Memory leak in streaming parser

Malcolm Wallace Malcolm.Wallace at cs.york.ac.uk
Mon Apr 2 08:54:09 EDT 2007


"Oren Ben-Kiki" <haskell-oren at ben-kiki.org> wrote:

> I just created an initial version of a "streaming" parser. This parser
> is intended to serve as a reference parser for the YAML spec.

An observation about your state setter functions, e.g.

  setDecision :: String -> State -> State
  setDecision decision state = State { sName         = state|>sName,
                                     sEncoding     = state|>sEncoding,
                                     sDecision     = decision,
                                     sLimit        = state|>sLimit,
                                     sForbidden    = state|>sForbidden,
                                     sIsPeek       = state|>sIsPeek,
                                     sTokens       = state|>sTokens,
                                     sCommits      = state|>sCommits,
                                     sConsumed     = state|>sConsumed,
                                     sChars        = state|>sChars,
                                     sMessage      = state|>sMessage,
                                     sLine         = state|>sLine,
                                     sColumn       = state|>sColumn,
                                     sCode         = state|>sCode,
                                     sLast         = state|>sLast,
                                     sInput        = state|>sInput }

You can shorten your code considerably by using the standard named-field
update syntax for exactly this task:

  setDecision :: String -> State -> State
  setDecision decision state = state { sDecision = decision }

Not only is it shorter, but it will often be much more efficient, since
the entire structured value is copied once once, then a single field
updated, rather than being re-built piece-by-piece in 15 steps.

> I must have done "too good a job" converting things to lazy form
> because, while the parser is streaming, it also hangs on to old state
> objects for no "obvious" reason. At least it isn't obvious to me after
> profiling it in any way I could think of and trying to `seq` the
> "obvious" suspects. Adding a `seq` in the wrong place will of course
> stop it from being a streaming parser...

You probably want to be strict in the state component, but not in the
output values of your monad.  So as well as replacing
    let ... in (finalState, rightResult)
with
    let ... in finalState  `seq`  (finalState, rightResult)
in the (>>=) method in your Monad instance (and in the separate defn of /),
you might also need to make all the named fields of your State datatype
strict.

Regards,
    Malcolm


More information about the Haskell-Cafe mailing list