[Haskell-cafe] XML parser recommendation?

Uwe Schmidt uwe at fh-wedel.de
Wed Oct 24 04:06:42 EDT 2007


Rene de Visser wrote:

> I think a step towards support medium size documents in HXT would be to 
> store the tags and content more efficiently.
> If I undertand the coding correctly every tag is stored as a seperate 
> Haskell string. As each byte of a string under GHC takes 12 bytes this alone 
> leads to high memory usage. Tags tend to repeat. You could store them 
> uniquely using a hash table. Content could be stored in compressed byte 
> strings.

Yes, storing element and attribute names in a packed format, something
similar to ByteString but for unicode values, would reduce the amount
of storage. A perhaps small shortcomming of that aproach are the conversions between
String and the internal representation when processing names.

The hashtable approach would of course reduce memory usage, but this
would require a global change of the processing model: A document then
does not longer consist of a single tree, it alway consists of a pair of a tree and a map.

By the way, the amount of memory used for strings ([Char] values) in Haskell is
a problem for ALL text processing tasks. Its not limited HXT, nor is it special to XML.

For me the efficieny problems with strings as list of chars and a possible
solution by e.g. implementing String data transparenty more efficent than other lists
is an issue for Haskell' (or Haskell'') and/or it's a challage for the language implementors.

  Uwe


More information about the Haskell-Cafe mailing list