[Haskell-cafe] attoparsec and backtracking

oleg at okmij.org oleg at okmij.org
Tue Mar 19 07:52:24 CET 2013


Wren Thornton wrote:
> I had some similar issues recently. The trick is figuring out how to
> convince attoparsec to commit to a particular alternative. For example,
> consider the grammar: A (B A)* C; where if the B succeeds then we want to
> commit to parsing an A (and if it fails then return A's error, not C's).

Indeed. Consider the following (greatly simplified) fragment from the
OCaml grammar

        | "let"; r = opt_rec; bi = binding; "in";
           x = expr LEVEL ";" ->
        | "function"; a = match_case ->
        | "if"; e1 = SELF; "then"; e2 = expr LEVEL "top";
          "else"; e3 = expr LEVEL "top" ->
...
        | "false" -> 
        | "true"  -> 

It would be bizarre if the parser -- upon seeing "if" but not finding
"then" -- would've reported the error that `found "if" when "true" was
expected'. Many people would think that when the parser comes across
"if", it should commit to parsing the conditional. And if it fails later, it
should report the error with the conditional, rather than trying to
test how else the conditional cannot be parsed. This is exactly the
intuition of pattern matching. For example, given

> foo ("if":t) = case t of
>                  (e:"then":_) -> e
> foo _ = ""

we expect that 
        foo ["if","false","false"]
will throw an exception rather than return the empty string. If the
pattern has matched, we are committed to the corresponding
branch. Such an intuition ought to apply to parsing -- and indeed it
does. The OCaml grammar above was taken from the camlp4 code. Camlp4
parsers

        http://caml.inria.fr/pub/docs/tutorial-camlp4/tutorial002.html#toc6

do pattern-matching on a stream, for example
     # let rec expr =
         parser
           [< 'If; x = expr; 'Then; y = expr; 'Else; z = expr >] -> "if"
         | [< 'Let; 'Ident x; 'Equal; x = expr; 'In; y = expr >] -> "let"

and raise two different sort of exceptions. A parser raises
Stream.Failure if it failed on the first element of the stream (in the
above case, if the stream contains neither If nor Let). If the parser
successfully consumed the first element but failed later, a different
Stream.Error is thrown. Although Camlp4 has many detractors, even they
admit that the parsing technology by itself is surprisingly powerful,
and produces error messages that are oftentimes better than those by
the yacc-like, native OCaml parser. Camlp4 parsers are used
extensively in Coq.

The idea of two different failures may help in the case of attoparsec
or parsec. Regular parser failure initiates backtracking. If we wish
to terminate the parser, we should raise the exception (and cut the
rest of the choice points). Perhaps the could be a combinator `commit'
that converts a failure to the exception. In the original example
A (B A)* C we would use it as A (B (commit A))* C.





More information about the Haskell-Cafe mailing list