[Haskell-beginners] question, chapter 10 Real World Haskell

Sat Apr 11 08:01:50 EDT 2009

On Sat, 11 Apr 2009 02:02:04 -0700
Michael Mossey <mpm at alumni.caltech.edu> wrote:

> I'll looking at the parser example, page 242 in Chapter 10 of Real
> World Haskell, and they are defining a type of monadic parser with
> the help of an operator they call ==>
> 
> You can find chapter 10 online. This ebook doesn't have page numbers,
> but you can find the example I'm looking at in the second called "A
> more interesting parser", about 40% of the way down:
> 
> <http://book.realworldhaskell.org/read/code-case-study-parsing-a-binary-data-format.html>
> 
> The authors have defined their parser by chaining together functions
> with ==>. The first function is "getState". What confuses me is: they
> use getState to "get the state out of the Parser," but a Parser is by
> definition a function that takes the parse state as its lone
> argument. I don't understand why they can't drop getState entirely.
> Maybe it's a simply a way to avoid wrapping the entire function in
> Parser (...)?
> 
> Some of this stuff looks "inefficient" to me, but I realize that in a
> lazy language with an optimizing compiler you can often write long
> chains of functions (many of which discard their results) and not
> impede efficiency.
> 
> 
> Thanks,
> Mike
> 
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://www.haskell.org/mailman/listinfo/beginners

This is a matter of good coding style (in the sense of modularity,
maintanability, etc.).

As you've seen it, the authors have first
started with a (>>?) function that allows them to combine Maybes, that
is to combine functions with a possibility of error. 

We therefore had the following functions and types:

> data Maybe a     -- The type of parsing results
> (>>?) :: Maybe a -> (a -> Maybe b) -> Maybe b
>                  -- To combine parsing functions
>
> Nothing :: Maybe a    -- To signal an error
> Just :: a -> Maybe a  -- To return a result without error 

Code written using this (i.e. parseP5_take2) makes explicit use of the
representation of parsing results as Maybe values.

When they added implicit state and error messages to the machinery,
they had to modify their parsing code by renaming things around. More
importantly, the changes affected all their code, even the parts that
wouldn't make use of the new functionality. If they had already written
a huge body of code that could work perfectly well without reporting
errors or accessing state, they'd still have needed to refactor it.

Now assume they had, instead, treated the type of parsing results as
abstract from the beginning:

> newtype Parse a = Parse (Maybe a) 
>                  -- The type of parsing results
> (===>) :: Parse a -> (a -> Parse b) -> Parse b
> p ===> f = case p of
>              Parse Nothing -> Parse Nothing
>              Parse (Just a) -> f a
>
> failure :: Parse a
> failure = Parse Nothing
>
> result :: a -> Parse a
> result a = Parse (Just a)

Writing their code in terms of Parse, (===>), failure and result rather
than Maybe, (>>?), Nothing and Just wouldn't have been any more
difficult. But they could then have added implicit state and error
messages by simpliy replacing the definitions:

> newtype Parse a = Parse (ParseState -> Either String (a, ParseState))
>                  -- The type of parsing results
> (===>) :: Parse a -> (a -> Parse b) -> Parse b
> (===>) = ...
>
> failure = bail "generic error"
> result = identity
>
> -- And the new primitives:
> bail :: String -> Parse a
> getState :: Parse ParseState
> putState :: ParseState -> Parse ()

Doing so, code that didn't make use of the new functionality
wouldn't have needed to be modified _at all_. Code that made use of it
would simply have called the new functions bail, getState and putState
at appropriate points. Given the magnitude of the change in
functionality, this is actually pretty incredible.

Now, and to finally answer your question:

Besides state and errors, there are a number of other useful facilities
that can fit in the framework of a (Parse a) type and a (===>) function
as above. Real World Haskell will introduce them in a systematic way
starting with Chapter 14 (Monads).

As long as our parsing code treats the Parse type as abstract, adding
another one of these facilities (like, say, backtracking) will be as
straightforward as adding state and error messages was in my example
above, and won't require any change in code that doesn't make use of
it. If we started explicitly using the fact that (Parse a) is a
function on ParseState, things would be different.

It is, generally, a matter of good style in all programming paradigms
to treat values as abstract as often as possible and only access them
through a well-defined interface that isn't dependent on their actual
representations. This is even more true in cases like this, when the
interface in question conforms to a well-known pattern (in this
particular case, it's a Monad) that can accomodate a large number of
different capabilities and uses.