[Haskell-cafe] Stripping text of xml tags and special symbols

Jeremy Shaw jeremy at n-heptane.com
Tue Aug 5 17:46:15 EDT 2008


At Tue, 5 Aug 2008 23:21:43 +0200,
Pieter Laeremans wrote:

> And is there some haskell function which converts special tokens lik & ->
> & and é -> &egu; ?

By default, xml only has 5 predefined entities: quot, amp, apos, lt,
and gt. Any additional ones are defined in the DTD.

But you can *always* use numeric character references like:

    &#nnnn; 
or
    &#xhhhh;

So, you should be able to implement a simple function which whitelists
a few characters ('a'..'z', 'A'..'Z', '0'..'9', ...), and encodes
everything else?

You might look at the source code for Text.XML.HaXml.Escape and
Network.URI.escapeString for inspiration.

j.

http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references


More information about the Haskell-Cafe mailing list