[Haskell-cafe] Newbie: a parser for a list of objects?

Mon Mar 26 12:45:38 EDT 2007

> -----Ursprüngliche Nachricht-----
> Von: "Dmitri O.Kondratiev" <dokondr at gmail.com>
> Gesendet: 26.03.07 16:44:12
> An: haskell-cafe at haskell.org
> Betreff: [Haskell-cafe] Newbie: a parser for a list of objects?

> Please see my questions inside comments {-- --} :
> Thanks!
> 
> ---
> module Parser where
> 
> import Data.Char
> 
> type Parse a b = [a] -> [(b, [a])]
> 
> {--
> Newbie: a parser for a list of objects?
> 
> I am working with the section  17.5 "Case study: parsing expressions" of the book "Haskell The Craft of Functional Programming", where a parser for a list of objects is defined. 
> I called this function pList in order to avoid confusion with 'list' as a term for data structure.
> 
> Please help me to understand how pList works (please, see the rest of the code at the end of this message):
> --}
>  
> pList :: Parse a b -> Parse a [b]
> pList p = (succeed []) `alt`
>          ((p >*> pList p) `build` (uncurry (:)))
> 
> 
> {--
> First of all, I don't quite understand why there must be a choice ('alt') between the function ('succeed') that always returns an empty list and the other part? This results in adding [] to the front, why? 
> 

Well, if the parser p doesn't succeed, we don't want the whole thing to fail. And p will (almost certainly) fail when the end of input is reached.
So without the alternative 'succeed []', we'd get

pL1 dig "12"  = [(('1':y),rem) | (y,rem) <- pL1 dig "2"]
                   = [(('1':y),rem) | (y,rem) <- [(('2':z),rem2) | (z,rem2) <- pL1 dig ""]]
                   = [(('1':y),rem) | (y,rem) <- [(('2':z),rem2) | (z,rem2) <- []]
                   = [(('1':y),rem) | (y,rem) <- []]
                   = []

because dig "" = []

> I thought that 'simplified' version of pList should still work fine. Trying to prove this I wrote :
> --}
> 
> pL1 :: Parse a b -> Parse a [b]
> pL1 p = (p >*> pL1 p) `build` (uncurry (:))
> 
> {--
> Which, as expected, does not work correctly - just gives an empty list [] -  but I don't understand why:

because the parser eventually fails when the end of input is reached.
> 
> *Parser> t1 "12345"
> []
> *Parser>
> 
> Also, I don't understand why the textbook version of pList gives this result: 
> 
> *Parser> test "12345"
> [("","12345"),("1","2345"),("12","345"),("123","45"),("1234","5"),("12345","")]

That's because of the order of alt's arguments:

(succeed [] `alt` p) inp = [([],inp)] ++ (p inp)

with pList p = ((p >*> pList p) `build` (uncurry (:))) `alt` succeed []
the resulting list woulde be reversed.

> 
> *Parser>
> 
> In particular, I don't understand where the first element ("","12345") of the resulting list comes from? 
> 
> I am trying to figure out how pList recursively unfolds. To my mind operators in the expression:
> 
> 
> (succeed []) `alt`((p >*> pList p) `build` (uncurry (:)))
> 
> has the following execution order:
> 1)  >*>
> 2) 'build'
> 3) 'alt'
> 
No, the first argument of alt gets evaluated first, because (p1 `alt` p2) inp = (p1 inp) ++ (p2 inp), thus we need p1 inp first.
Then we see we haven't hit bottom, so we need the second argument of (++) (resp. alt).
So next we need to evaluate p, then pList p, combine the results of those with the second argument of build, uncurry (:).

> It seems that operation >*> should be done as many times as many elements the input list has. Right?
> 

Unfortunately not. Let's stay with pList dig. Say your input starts with n digits.
>From the example above you can conjecture that length (pList dig inp) == (n+1).
Now in the outermost (dig >*> pList dig) branch, you apply (pList dig) to an input beginning with (n-1) digits, returning a list of length n,
to each element of this list you adjoin the first digit, resulting in n + (n-1) + ... + 1 = n*(n+1)/2 applications of (>*>).
(Lesson: you need an exclusive choice, using the second parser only if the first one fails and a maximal munch combinator in your library, too)

> 
> Signature:
> 
> (>*>) :: Parse a b -> Parse a c -> Parse a (b, c)   
> 
> implies that second argument of the expression:
> 
> p >*> pList p
> 
> should be of type 'Parse a c' but in this application it is of type 'Parse a b -> Parse a [b]'
> 
c is [b], so p >*> pList p has type Parse a (b,[b]), then
(p >*> pList p) `build` (uncurry (:)) has type Parse a [b]

> How can that be?
> How recursion termination conditinon is expressed in pList?

recursion terminates when p fails.

HTH,
Daniel

> --}
> 
> none :: Parse a b
> none inp = []
> 
> succeed :: b -> Parse a b
> succeed val inp = [(val, inp)]
> 
> suc:: b -> [a] -> [(b, [a])]
> 
> suc val inp = [(val, inp)]
> 
> spot :: (a -> Bool) -> Parse a a
> spot p [] = []
> spot p (x:xs) 
>      | p x = [(x, xs)]
>      | otherwise = []
> 
> alt :: Parse a b -> Parse a b -> Parse a b
> alt p1 p2 inp = p1 inp ++ p2 inp
> 
> bracket = spot (=='(')
> dash = spot (== '-')
> dig = spot isDigit
> alpha = spot isAlpha
> 
> infixr 5 >*>
> 
> (>*>) :: Parse a b -> Parse a c -> Parse a (b, c)
> 
> (>*>) p1 p2 inp = [((x,y), rem2) |(x, rem1) <- p1 inp, (y, rem2) <- p2 rem1]
> 
> build :: Parse a b -> (b -> c) -> Parse a c
> build p f inp = [ (f x, rem) | (x, rem) <- p inp]
> 
> test = pList dig 
> t1 = pL1 dig 
> 
> 
> -----------------------------------------------------------------