[Haskell-cafe] types for parsing a tree

Jared Jennings jjenning at gmail.com
Fri Sep 10 12:53:10 EDT 2010


Dear haskell-cafe:

I'm trying to parse an Open Financial eXchange (OFX) 1.x file. It
details my bank transactions, like debit card purchases. It's
SGML-based and it goes like:

    <OFX>[...]
        <STMTRS>[...]
            <STMTTRN>[...]
                <TRNUID>9223ry29r389
                <NAME>THE GROCERY STORE BLABLABLA
                <TRNAMT>234.99
            </STMTTRN>
            <STMTTRN>[...]
                <TRNUID>1237tg832t
                <NAME>SOME DUDE ON PAYPAL 4781487
                <TRNAMT>2174.27
            </STMTTRN>
        </STMTRS>
    </OFX>

I've left out a bunch, but as you can see it's tree-shaped, and the
only reason they didn't misuse XML as a data serialization language
instead of SGML was because it wasn't popular yet. (OFX 2.x uses XML
but my bank doesn't use OFX 2.x.)

When I imagine how to put this into a data structure, I think:

    -- The '...' below is stuff like the date, info about the bank
    data OFX = OFX { statement :: StatementResponse, ... }
    -- The '...' below is stuff like the account number
    data StatementResponse = StatementResponse { transactions:
[Transaction], ... }
    data Transaction = Transaction { id :: String, name :: String,
amount :: Decimal, sic :: Maybe Int, ... }

Then I tried to make a parser to emit those data types and failed. I
come from Python, where there's no problem if a function returns
different types of values depending on its inputs, but that doesn't
fly in Haskell.

I've tried

    data OFXThing = OFX { statement :: OFXThing } | StatementResponse
{ ... transactions :: [OFXThing] }

but that would let me make trees of things that make no sense in OFX,
like a transaction containing a statement.

I made a

     data Tree k v = Branch k [Tree k v] | Leaf k v
     type TextTree = Tree String String

and a tagsoup-parsec parser that returns Branches for tags like OFX,
and Leafs for tags like TRNUID. But now I just have a tree of strings.
That holds no useful type information.

I want my types to say that OFXes contain statements and statements
contain transactions - just like the OFX DTD says. How can I construct
the types so that they are tight enough to be meaningful and loose
enough that it's possible to write functions that emit them?


More information about the Haskell-Cafe mailing list