HaXml, memory usage and segmentation fault

Joe English jenglish@flightlab.com
Wed, 31 Oct 2001 17:37:50 -0800


An update on Dmitry's problems with HaXml memory usage:

 + Compiling HaXml and the driver program with ghc -O helps a *lot*.

 + Using the version of HaXml that comes preinstalled with
   GHC (-package text) helps even more.  There is a slight difference
   in the 'Pretty' module (which is used to print the output) between
   the two versions.

 + I wrote an adapter that converts my parser's XML representation
   into HaXml's, so you can use it as a drop-in replacement.
   This helps some, but not enough.  The heap profile using
   HaXml 1.02 has two large humps: the first from parsing the
   input, and the second from pretty-printing the output.
   (With the GHC version of HaXml the second hump is about half
   as tall as with the "official" HaXml version).
   With the new parser, only the smaller hump remains.

 + Figuring that using a pretty-printer is overkill, I replaced
   it with a quick hack that converts the HaXml representation
   _back_ into my representation and feeds it to a serializer
   that I had previously written.  This improves things some more:
   the identity transformation 'processXmlWith keep' now has a
   flat heap profile.

 + Unfortunately, Dmitry's original program still has a space leak.
   I suspect that the HaXml combinators (or, more likely,
   the HaXml internal representation) are not as space-efficient
   as I had originally thought, since when I rewrote Dmitry's test
   case to use the new parser's internal representation directly
   I again got a flat heap profile --  there doesn't
   seem to be anything wrong with the structure of the
   original program.


The code will be ready to release Real Soon Now;
I'll keep you posted.


--Joe English

  jenglish@flightlab.com