[Haskell-cafe] Categorized Weaknesses from the State of Haskell 2011 Survey

Wed Sep 14 01:38:45 CEST 2011

On 09/13/2011 05:15 PM, Malcolm Wallace wrote:
> 
> I am the first to admit that HaXml's documentation is not as good as
> it could be, and I am sorry that you have had a bad experience.

Sorry for the tirade =) That was a while ago, but I definitely felt some
sympathy for the guy in the quote.

> One thing I am puzzled about, is just how extremely difficult it must
> be, to click on "Detailed documentation of the HaXml APIs" from the
> HaXml homepage, look for a moment until you see
> "Text.XML.HaXml.Parse" in the list of modules, click on it, and find,
> right at the top of the page, a function that parses a String into an
> XML document tree.

As someone who just wants to parse an XML file, here's what happens.
First, I click on the API docs. I'm presented with a list:

    * Text
          o XML
                + Text.XML.HaXml
                      # Text.XML.HaXml.ByteStringPP
                      # Text.XML.HaXml.Combinators
                      # DtdToHaskell
                            * Text.XML.HaXml.DtdToHaskell.Convert
                            * Text.XML.HaXml.DtdToHaskell.Instance
                            * Text.XML.HaXml.DtdToHaskell.TypeDef
                      # Text.XML.HaXml.Escape
                      # Html
                            * Text.XML.HaXml.Html.Generate
                            * Text.XML.HaXml.Html.Parse
                            * Text.XML.HaXml.Html.ParseLazy
                            * Text.XML.HaXml.Html.Pretty
                      # Text.XML.HaXml.Lex
                      # Text.XML.HaXml.Namespaces
                      # Text.XML.HaXml.OneOfN
                      # Text.XML.HaXml.Parse
                      # Text.XML.HaXml.ParseLazy
                      # Text.XML.HaXml.Posn
                      # Text.XML.HaXml.Pretty
                      # Text.XML.HaXml.SAX
                      # Schema
                            * Text.XML.HaXml.Schema.Environment
                            * Text.XML.HaXml.Schema.HaskellTypeModel
                            * Text.XML.HaXml.Schema.NameConversion
                            * Text.XML.HaXml.Schema.Parse
                            * Text.XML.HaXml.Schema.PrettyHaskell
                            * Text.XML.HaXml.Schema.PrimitiveTypes
                            * Text.XML.HaXml.Schema.Schema
                            * Text.XML.HaXml.Schema.TypeConversion
                            * Text.XML.HaXml.Schema.XSDTypeModel
                      # Text.XML.HaXml.ShowXmlLazy
                      # Text.XML.HaXml.TypeMapping
                      # Text.XML.HaXml.Types
                      # Text.XML.HaXml.Util
                      # Text.XML.HaXml.Validate
                      # Text.XML.HaXml.Verbatim
                      # Text.XML.HaXml.Wrappers
                      # Text.XML.HaXml.XmlContent
                            * Text.XML.HaXml.XmlContent.Haskell
                            * Text.XML.HaXml.XmlContent.Parser
                      # Xtract
                            * Text.XML.HaXml.Xtract.Combinators
                            * Text.XML.HaXml.Xtract.Lex
                            * Text.XML.HaXml.Xtract.Parse

Jesus! /You/ know that I want to look in Text.XML.HaXml.Parse, but /I/
don't. Let's say I choose the first link: Text.XML.HaXml. It's a list of
modules, along with their documentation. All blank! Hitting the back button.

The first thing I notice is that there seems to be specialized parser
modules for different content types, e.g. Text.XML.HaXml.Html.Parse.
Maybe I want Text.XML.HaXml.Schema.Parse? I mean, I want to parse
something with a schema, right? Nope, it's for parsing XSDs.

How about Text.XML.HaXml.Util? This looks right...

  Only a small module containing some helper functions to extract xml
  content - I would have added this to Types but I've put it into an
  additional module - to avoid circular references (Verbatim - Types)

and it's got a function called docContent which is supposed to "Get the
main element of the document..." Great. Its type is,

  docContent :: i -> Document i -> Content i

so now, to have any hope of using this function (or figure out that I'm
in the wrong place entirely), I have to go figure out what those types
are. Document has one constructor,

  Document Prolog (SymTab EntityDef) (Element i) [Misc]

which leads me to,

  Prolog (Maybe XMLDecl) [Misc] (Maybe DocTypeDecl) [Misc]
    XMLDecl VersionInfo (Maybe EncodingDecl) (Maybe SDDecl)
      type VersionInfo = String
      newtype EncodingDecl = EncodingDecl String
      type SDDecl = Bool
    data Misc = Comment Comment | PI ProcessingInstruction
      type Comment = String
      type ProcessingInstruction = (PITarget, String)
        type PITarget = String
    data DocTypeDecl = DTD QName (Maybe ExternalID) [MarkupDecl]
      data QName = N Name | QN Namespace Name
        type Name = String
        data Namespace = Namespace {nsPrefix :: String, nsURI :: String}
      data ExternalID = SYSTEM SystemLiteral | PUBLIC PubidLiteral
                                                      SystemLiteral
        newtype SystemLiteral = SystemLiteral String
        newtype PubidLiteral = PubidLiteral String
      data MarkupDecl =   Element ElementDecl
                        | AttList AttListDecl
                        | Entity EntityDecl
                        | Notation NotationDecl
                        | MarkupMisc Misc
        data ElementDecl = ElementDecl QName ContentSpec
          data ContentSpec =   EMPTY
                             | ANY
                             | Mixed Mixed
                             | ContentSpec CP
          ...
          ...

most of which are completely undocumented. I have no idea what any of
this stuff means! As a result, I don't know what the 'docContent'
function does, or whether or not I'm even looking in the right place. At
this point, I'm probably googling for blog entries and wondering why I'm
wasting my time when all I really need is a "hello, world" example.

If by some miracle I do discover Text.XML.HaXml.Parse.xmlParse (do I
want ParseLazy? What's the difference?) I can get myself a Document. Now
what? Do I try to understand that giant type hierarchy above? There's
nothing else in the Parse module that looks useful. All of the good
stuff, it turns out, is in the ambiguously-named
Text.XML.HaXml.Combinators. Ok, the paper helps a little bit here, if
you want to include a few years of college as a prerequisite for parsing
XML.

There are some things here that look promising, like 'elm' and 'tag'.
However, they all have mysterious types:

  elm, txt :: CFilter i

What's a CFilter?

  type CFilter i = Content i -> [Content i]

The Content type actually contains words I recognize! Awesome! But wait,
I don't have Content! I have a Document! How do I get Content out of my
Document? Argh... This is bringing back bad memories =)

> In fact, my wish as a library author would be: please tell me what
> you, as a beginner to this library, would like to do with it when you
> first pick it up?  Then perhaps I could write a tutorial that answers
> the questions people actually ask, and tells them how to get the
> stuff done that they want to do.  I have tried writing documentation,
> but it seems that people do not know how to find, or use it.
> Navigating an API you do not know is hard.  I'd like to signpost it
> better.

I was trying to parse user timelines from the Twitter API. I threw away
most stuff, but wanted to go through the tree and extract the name,
body, date, etc. from the individual entries.

What's really missing in my opinion is an overview of how everything
fits together, along with examples. There are a couple "big" types that
you need to know to use the library. Document, Content, and CFilter come
to mind. All of those should be well-documented:

 * What do they represent?
 * How do they fit together?
 * Where can I get them, i.e. what functions produce them?
 * What can I do with them?

The examples don't need to be too complicated. How to read/write a file,
how to get an element's name, attributes, and text, etc. Anything is
better than nothing. Most of the examples in blog posts and other
people's code are out of date; while the differences may be small, a new
user has no way of knowing that. GHC is just going to throw a type error
that may as well be Chinese.