Q. about XML support

Joe English jenglish@flightlab.com
Thu, 20 Feb 2003 19:34:15 -0800


Graham Klyne wrote:
> Joe English wrote:
> >What are you looking for in an XML toolkit?
>
> Hi, thanks for responding.  My desiderata:
>
> 1. Works with HUGS and GHC (I'm currently developing with HUGS, but
> anticipate using GHC for "production" code).


HXML works with Hugs, GHC, GHCI, and NHC, with
the caveat that under Hugs it suffers from a
space leak, which limits the size of documents
that can be processed.

I believe the other XML toolkits also work under
all major Haskell implementations, or can be made
to do so with a little effort.


> 2. Namespaces, though I'm prepared to roll-my-own on top of an existing XML
> parser.

See below for my thoughts on this...

> 3. Well-formedness checking would be nice;  i.e return a useful error
> indication if tags are mismatched, that sort of thing.

OK; I'll add that to the TODO list for HXML.

HaXml does some well-formedness checks (mismatched end-tags,
doesn't look like it handles duplicate attribute values though.)
The XML Toolbox does WF checking and validation.


> 4. Validation is not required for my application.
>
> >As far as HXML goes, I have a rough sketch of an
> >implementation of XML namespace support, not yet
> >finished or released.  (This is a somewhat thorny
> >problem; implementing XMLNS is not hard, but implementing
> >it in a sane way requires some ingenuity.)
>
> I was looking at HXML yesterday, and it has the great advantage that I feel
> I can understand it well enough to tinker.  And the code looks clean to my
> Haskell-inexperienced eye.  The main drawback is the lack of
> well-formedness checking, but think I could live with that, at least for
> prototyping purposes.
>
> I think your presentation of an XML parse as a tree of XMLNodes closely
> matches what I want to do.  Would it make sense to add a new node
> constructor indicating a syntax error?


That's a good idea.  There's something similar in the
[XMLEvent] representation (HXML's lazy functional equivalent
of SAX).  ErrorEvents presently turn into hard errors when the
tree is built, but a separate XMLNode type for errors would
allow these to be propagated into the Tree view without
introducing unnecessary strictness.


> I also thought briefly about adding namespace support, and contemplated
> replacing your
>
>     type Name = String    -- from memory, maybe not exactly right?
>
> with something like
>
>     data Name = QName String String
>
> where the two strings would be namespace URI and local name
> respectively.

That's what most XML toolkits do (i.e., treat Names
as URI + local-name pairs).  I don't think this is
the best way to do things though; this can lead to
monstrosities like:

    case nodeName node of
	QName "http://www.w3.org/1999/xhtml" "p" -> ...
	QName "http://www.w3.org/1999/xhtml" "h1" -> ...
	QName "http://www.w3.org/1999/xhtml" "pre" -> ...

This can be simplified, of course, but it's really much
easier if you can treat names as atomic strings (just 
like in SGML and in pre-Namespaces XML):

    case nodeName node of
	"html:p" -> ...
	"html:h1" -> ...
	"html:pre" -> ...

The approach I'm thinking of is to let the application programmer
define an "internal" namespace environment, then rewrite
element and attribute names in the parsed document to
use the locally-defined prefixes.


> I haven't yet figured what the cascading effects of such a
> change might be.   Better, maybe, I define a new XMLnode type that uses
> QName instead of Name, and write a function to translate a (Tree XMLNode)
> to a (Tree XMLQNode)?  That keeps things cleanly separated.

Another approach is to parameterize XMLNode on the type
of names (XMLNode String vs. XMLNode QName).  Or, you
could store the namespace name and local name using
James Clark's notation, "{http://www.w3.org/1999/xhtml}p"

> BTW, do you have a test suite for your parser?  (I've found the HUnit
> library to be very useful, and easily transferred my previous experience
> with JUnit.)

No, but I really should.  Another one for the TODO list :-)


--Joe English

  jenglish@flightlab.com