[Haskell-cafe] Programming style and XML processing in Haskell

MR K P SCHUPKE k.schupke at imperial.ac.uk
Thu May 13 18:45:25 EDT 2004


Just sticking in my two pence worth...

I am not sure what application you intend this for, but I find most XML
parsers completely useless. With my application programmers hat on, I do
not want to validate against a DTD, I want to extract as much information
as possible from bad XML... what I would like is a correcting parser - one
which outputs XML in compliance, but will accept any old rubbish and make
a best guess attempt to fix it up (based on a set of configurable
heuristic rules)...

Secondly I deal with very large documents, the tree form of which won't fit
in memory, so I would see an XML parser doin the following...

	parser :: String -> [XmlElements]

	filter :: [XmlElements] -> [XmlElements]

	reader :: [XmlElements] -> ... output data types ...

	writer :: ... input data types ... -> [XmlElements]

	render :: [XmlElements] -> String

In order to keep track of the tree structure the tree-depth of each element
is encoded within the XmlElement type... thus allowing the data to be streamed
through the filters/readers etc. This means the parser can output the first element as
soon as it encounters the second element (lazy list == stream in Haskell) 
rather than having to wait until the last element as would happen with a DOM tree
(it is a tree not a graph as XML elements can only contain sub-elements)...

As I said the above is just my opinion, and as it happens I have written a
parser that does the above... I guess that is why there are several 
parsers for XML available (different requirements) and there will probably
be many more ...

	Regards,
	Keean.


More information about the Haskell-Cafe mailing list