[Haskell] Haskell as a markup language

oleg at pobox.com oleg at pobox.com
Wed Mar 1 03:46:47 EST 2006


We demonstrate that Haskell as it is, with no TH or other
pre-processors, can rather concisely represent semi-structured
documents and the rules of their processing. In short, Haskell can
implement SXML (ssax.sourceforge.net), right in its syntax and with
the *open* and extensible set of `tags'. The benefit of Haskell is of
course in static type guarantees, such as prohibiting an H1 element to
appear in the character content of other elements. It also seems that
the specification of various pre-post-order and context-sensitive
traversals is more concise in Haskell compared to Scheme. Again, we
are talking about the existing Haskell, i.e., Haskell98 plus common
extensions. No Template Haskell or even overlapping instances are
required.

The features of our framework are:

	- extensibility: at any time the user can add new `tags' and
new transformation/traversal modes.

	- static safety: there are no dynamics, and no variant data
types for elements or attributes, and thus no possibility of a run-time
pattern-match failure.

	- static guarantees: the framework statically ensures that a
`block-level element' cannot appear in the inline (i.e., character)
content and that a character-content entity cannot appear in a pure
element content. Some entities (e.g., `title') may be either element
or attribute. OTH, many other entities may not occur in the attribute
context. Thus the generated XML or HTML document will not only be
well-formed: it would also satisfy some validity constraints. More
validity constraints can be added to the framework.

	- flexibility of the traversal: the same document can be
transformed in pre-, post-, accumulating or other ways, even within
the same session. A document can be processed in a pure function, or
in a monadic action, including an IO action. In the latter case, we
can, e.g., verify URLs as we generate an HTML document.
A `tag' can be transformed depending of the transformation modes and
tag's context: e.g., the same `(title "title")' expression may appear as
an attribute or an element -- and it will be rendered differently by
the (X)HMTL serializer.

       - pleasant syntax, which closely resembles SXML. We can write
(p "string" "string1" br "string3") without unnecessary commas and
other syntactic distractions. We take polyvariadic functions to the
new level (pun intended). We also exploit functional dependencies on a
per-instance level -- which is a very useful but, alas, rarely used
technique.


We introduce two frameworks for representing semi-structured data in
Haskell. In one, semi-structured data is truly a heterogeneous
structure -- and so can be showed, stored, written and read. In the
second approach, semi-structured data are represented as a monadic
value, polymorphic over the rendering monad. Different choices of the
monad give different views of data. In GHC, this amounts to
representing semi-structured data by linked dictionaries; in JHC, both
approaches are probably equivalent. The first representation is quite
reminiscent of HList; the second has clear SYB3 (``Scratch your
boilerplate with class'') overtones, although the realization is quite
different. In particular, there is no need for recursive instances.

Because of the amount of code involved (to describe common HTML tags
and their constraints), this message is not the complete code. The
code is available from

     http://pobox.com/~oleg/ftp/Haskell/HSXML.tar.gz

Please see the files HSXML.hs, HSXML_HTML.hs and the sample code
sample1c.hs for the data-centric framework. The monad-centric
framework is in one self-contained file CSXML.hs.  The first framework
is more in line with SXML, and so will be considered here.

Our running example, inspired by the Haskell.org web site, is:

test_haskell =
    (document
     (head
      [title "Haskell" longdash "HaskellWiki"]
      [meta_tag [description "All about the language" br "Haskell"]])
    (body
     [h1 "Haskell"]
     [div (attr [title "titleline"])
      [p
       [[a (attr [href (FileURL "/haskellwiki/Image:Haskelllogo.jpg")])
         "Haskell" br "A <purely functional> language"]]
       br
      ]
      [p "Haskell is a general purpose," 
       [[em [[strong "purely"]] "functional"]] "programming language"]]))


We should point out the absence of commas.  The [[x]] syntax can be
replaced with a simple [x], but that would require overlapping
instances. So far, we have avoided overlapping instances.
Incidentally, in many Scheme systems (and, reputedly, in R6RS) square
brackets are taken to be synonymous to the round ones.


We see that the |br| can be used in various contexts: in the character
content of an element and of an attribute (cf. `description' for the
latter). However, if we try to replace "Haskell" within the
`description' attribute with [[em "Haskell"]] we get an error that
    Couldn't match `CT_attr' against `CT_inline'
      Expected type: CT_attr
      Inferred type: CT_inline

OTH, the string "Haskell" that appears within |h1| element may be
replaced with [[em "Haskell"]]. However, if we try to enter
     (h1 [[h1 "Haskell"]]) 
we get type error 
    Couldn't match `CT_inline' against `CT_block'
Indeed, the element |H1| is not allowed in the `inline' context.

We can transform the test_haskell data structure in many ways (e.g.,
extract all the titles, renumber sections, etc). We can also render it
in HTML. The result is quite predictable:

 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
  "http://www.w3.org/TR/REC-html40/loose.dtd">
 <head>
 <title>Haskell&mdash; HaskellWiki</title>
 <meta name="description" content="All about the language
 Haskell"></meta>
 </head>
 <body>
 <h1>Haskell</h1>

 <div title="titleline">
 <p><a href="/haskellwiki/Image:Haskelllogo.jpg">Haskell
 <br>
 A &lt;purely functional&gt; language</a>
 <br>
 </p>
 <p>Haskell is a general purpose,<em><strong>purely</strong>functional</em>programming language</p>
</div>
</body>

We should note that |title| -- which can be either an element or an
attribute -- is indeed rendered differently depending on the context.

Just to emphasize the extensibility of the framework, we show how easy
it is to add new elements. For example, the `tags' |longdash|, |a|, |div|
and |title| are not defined in the base file HSXML.hs. We add these
tags in sample1c.hs, as follows:

Let us start with an abbreviation for the long dash. It may appear in
the character content of an element or an attribute

> data LongDash = LongDash deriving Show
> longdash :: Check_ia LongDash ct => HW ct LongDash
> longdash = HW LongDash
>
> -- and how to render it in HTML
> instance RenderInline (HW ct LongDash) where
>    render_inline f _ = emit_lit "&mdash;" >> return f

Actually, the latter instance describes rendering of longdash in any 
|MonadRender m| -- any monad that defines morphisms |emit|,
|emit_elem| and |emit_attr|.

Anchor is an inline element with an inline content

> data Anchor a b = Anchor a b deriving Show
> a attrs body = 
>   build (as_inline . HW . Anchor (as_block attrs) . rev'apppend HNil) 
> 	nil_inline body
>
> instance (Render a, RenderInline b) =>RenderInline (HW ct (Anchor a b)) where
>     render_inline f (HW (Anchor attrs body)) = 
> 	emit_elem "a" [] (Just (render attrs)) (render_ib body)
> 	>> return False


Title can be either 
    - a block-level element whose content is CT_inline
    - an attribute (whose content is, therefore, CT_attr)

> newtype Title a = Title a deriving Show
> title x = build ((`as_ctx`  co) . HW . Title . rev'apppend HNil) nil_ab x
>    where nil_ab = HW HNil `as_ctx` ci
> 	  (ci,co) = title_ctx
> class Check_ia (Title ()) i => TitleCtx i o | i -> o, o -> i where 
>     title_ctx :: (HW i a, HW o a) ; title_ctx = undefined
> instance TitleCtx CT_attr CT_battr
> instance TitleCtx CT_inline CT_block

It can be rendered context-sensitively:

> instance RenderInline a => Render (HW CT_battr (Title a)) where
>    render (HW (Title x)) = emit_attr "title" 
>			     ((render_inline False x) >> return ())
> instance RenderInline a => Render (HW CT_block (Title a)) where
>    render (HW (Title x)) = emit_elem "title" [Hint_nl] Nothing
>			     (render_ib x)

It seems many of the SXML transformations (cf. the `examples' directory
of the SSAX distribution or CVS repository, ssax.sf.net) can be
rendered in Haskell. More examples are forthcoming.



More information about the Haskell mailing list