[Haskell-cafe] Techniques for ensuring parser correctness?

S. Doaitse Swierstra doaitse at swierstra.net
Mon Jul 26 03:03:14 EDT 2010


On 26 jul 2010, at 03:51, Jason Dagit wrote:

> Hello,
> 
> I find that parser correctness is often hard to verify.  Therefore, I'm interested in techniques that others have used successfully, especially with Haskell.

It seems to me that you are not so much trying to verify parsers, but more specifically Parsec parsers. Since in Parsec-based parsers you control the backtracking explicitly such parsers can get very complicated semantics. Now the question arises: what does it mean for a (Parsec) parser to be correct? Do you have another description of the  language which is to be recognised, e.g. a context-free grammar. Only then can you give meaning to the word "correctness".

In general I think that the more your parser combinators deviate from context-free grammars in terms of expressiveness, the more problems you will encounter. If you make heavy use of the monadic part, you will not only have to prove the correctness of static parsers, but even of parsers which are generated dynamically. If you use the backtrack-controlling features, your proofs will become even more complicated, since it is unlikely that your more abstract formalism in which you have specified your language does not have a similar construct: here comes in about 50 years on research on parsing techniques and grammar analysis. If your grammar is e.g. LL(1) then you can verify that some of the back-tracking-controlling features in your Parser parser have been used in a sound way, i.e., that you will be able to parse any sentence that your grammar describes.

If you have a context-free grammar, and you want to be relatively sure that the parser is correct and you do not want to go through large verification efforts I suggest you use the uu-parsinglib; the only restriction there is is that your grammar should fulfill certain modest "well-formedness" criteria, such as being non-left-recursive and non-ambiguous. Then  the semantics of the combinators are exactly what you want, i.e. your parsers and your grammars are isomorphic. If you have however an "incorrect formal specification", i.e., a specification which contains ambiguous non-terminals like p* constructs where p can reduce to an empty string   things break. The first problem one is not recognised and will lead to a non-terminating parser, whereas the second problem is detected by the grammars analysing themselves while being used, and leading to a run-time error message once you reach that part of the grammar during parsing.

If you insist on using left-recursive parsers you may use the left-corner transform from the 

http://hackage.haskell.org/packages/archive/ChristmasTree/0.2/doc/html/Text-GRead-Transformations-LeftCorner.html

package, or use a parser generator like happy; parser generators usually do some form of analysis (i.e. proving properties), which captures many mistakes in the design of a language.

Furthermore you may take a look at: 
@inproceedings{DBLP:conf/mpc/BrinkHL10,
  author    = {Kasper Brink and
               Stefan Holdermans and
               Andres L{\"o}h},
  title     = {Dependently Typed Grammars},
  booktitle = {MPC},
  year      = {2010},
  pages     = {58-79},
  ee        = {http://dx.doi.org/10.1007/978-3-642-13321-3_6},
  crossref  = {DBLP:conf/mpc/2010},
  bibsource = {DBLP, http://dblp.uni-trier.de}

Doaitse Swierstra










> 
> Techniques I'm aware of:
>   * Round trip checks: Generate a datastructure, render as a string, parse back, and compare.  Quickcheck can be used to automate this.
>   * Fuzz testing:  What tools exist to help me?
>   * Formal verification: Has anyone been using this with Haskell parsers?  Other than general theorem provers, say Isabelle, what tools exist?
> 
> My specific need:
> The immediate challenge I have is that I'm modifying the parser that Darcs uses and we would like to improve the parser's test suite as well.  The parser abstraction used in this case follows parsec's API.  Left to my own devices I would use round trip checks, written with quickcheck, for this exercise.  Because we're using a parsec style parser, I don't have a nice neat grammar handy.
> 
> Thanks in advance for any advice you have!
> 
> Thanks,
> Jason
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100726/b20e1dd1/attachment.html


More information about the Haskell-Cafe mailing list