Data representation, maybe reflection, laziness

Tue Nov 4 00:39:50 EST 2003

On vrijdag, okt 31, 2003, at 21:06 Europe/Amsterdam, Mark Carroll wrote:
> Ralf Hinze and Simon Peyton-Jones wrote an interesting paper on generic
> programming and derivable type classes. It looked like maybe 
> programmers
> would be able to write their own "deriving xml" stuff and whatever, 
> which
> looked great because, if there's not already one out there, I'd love to
> derive some read/show analogue automatically for data in some encoding
> that's very efficient to write and parse (i.e. not XML (-:).

Johan Jeuring and I submitted a paper [1] to PLAN-X concerning this 
topic. In an earlier paper [2] we described a Haskell-XML data binding, 
that is, a type-safe translation scheme from (a sizeable subset of) XML 
Schema to Haskell. In [1] we describe a Generic Haskell program which 
automatically infers certain coercions between the translation of an 
XML Schema type, which is very large and ugly, and user-defined Haskell 
datatype capable of representing values of the Schema type. The idea is 
to infer the function that transforms values of the ugly type picked by 
the translator to values of a traditional, Haskellish datatype picked 
by the user.

For example,  our translator takes the Schema type doc (representing a 
bibliographic entry):

     <element name="doc"    type="docType"/>
    <complexType name="docType">
       <sequence>
         <element ref="author" minOccurs="0"
                  maxOccurs="unbounded"/>
         <element ref="title"/>
         <element ref="pubDate" minOccurs="0"/>
       </sequence>
       <attribute name="key" type="string"/>
     </complexType>
     <element name="author" type="string"/>
     <element name="title"  type="string"/>
     <complexType name="pubDateType">
       <sequence>
         <element ref="year"/>
         <element ref="month"/>
       </sequence>
     </complexType>
     <element name="pubDate"
              type="pubDateType"/>
     <element name="year"  type="int"/>
     <element name="month" type="int"/>

to a certain ugly datatype X. [2] defines generic functions:

   parse{|t|} :: String -> Maybe t
   unparse{|t|} :: t -> Maybe String

(Well, we only describe parse, but unparse is very easy...)

Now say the user defines the following datatype in some module:

 > data Doc = Doc
 >   { key      :: String,
 >     authors  :: [String],
 >     title    :: String,
 >     pubDate  :: Maybe PubDate }
 >
 > data PubDate= PubDate
 >   { year     :: Integer,
 >     month    :: Integer }

This is, IMO, the `ideal' translation of the Schema type. Now, although 
X /= Doc, there is in fact a `canonical' injection X -> Doc, determined 
by the types alone, which happens to do what one wants.

In [1] we define generic functions:

   reduce{|t|} :: t -> Univ
   expand{|t|} :: Univ -> t

where Univ is a universal type which you don't need to know anything 
about. The program

 > expand{|T|} . reduce{|S|} :: S -> T

denotes the canonical function, which is inferred generically by 
inspecting the types S and T, relieving the user of the burden of 
writing it out themselves.

So now, say you want to write a GH program which reads in a document 
conforming
to the Schema type `doc' from standard input, deletes all authors named 
"Dubya", and writes the result to standard output. Here it is:

< main    = interact work
< toE_doc = unparse{|E_doc|} . expand{|E_doc|} .
<           reduce{|Doc|}
< toDoc   = expand{|Doc|} . reduce{|E_doc|} .
<           parse{|E_doc|}
< work    = toE_doc .
<           (\d -> d { authors =
<              filter (/= "Dubya") (authors d) }) .
<           toDoc

And that's it. All the messy stuff is inferred by GH and the translator.

OK, now the reason that I prepended this message with "FWIW": although 
we have an implementation of the translator and coercion inferencer, 
they're only prototypes and far from usable in practice. In fact, the 
translator doesn't read XML at all but rather operates on XML abstract 
syntax (a tree datatype).

Frankly, I don't think I will take the time to turn the prototype into 
anything releasable, but I wouldn't mind turning over the sources (such 
as they are :) to someone who has a serious interest. Take a look at 
the papers and see if it appeals to you.

Regards,
Frank

[1] @TechReport{ACJ03c,
   author =       {Atanassow, Frank and Clarke, Dave and Jeuring, Johan},
   title =        {Scripting {XML} with {G}eneric {H}askell},
   institution =  {Utrecht University},
   year =         {2003},
   url = {ftp://ftp.cs.uu.nl/pub/RUU/CS/techreps/CS-2003/2003-023.pdf},
   number =       "UU-CS-2003"
}

[2] @misc{AJ03,
   author  = {Frank Atanassow and Johan Jeuring},
   title   = {Type isomorphisms simplify {XML} programming},
   year    = 2003,
   note    = {Submitted to PLAN-X 2004},
   url     = {http://www.cs.uu.nl/~franka/pub},
   urlpdf  = {http://www.cs.uu.nl/~franka/planx04.pdf},
   pubcat  = {journal},
}