<br><br><div class="gmail_quote">On Thu, Mar 14, 2013 at 7:53 AM, Duncan Coutts <span dir="ltr">&lt;<a href="mailto:duncan.coutts@googlemail.com" target="_blank">duncan.coutts@googlemail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi folks,<br>

<br>

I want to give you advance notice that I would like to make Cabal depend<br>

on parsec. The implication is that GHC would therefore depend on parsec<br>

and thus it would become a core package, rather than just a HP package.<br>

So this would affect both GHC and the HP, though I hope not too much.<br>

<br>

The rationale is that Cabal needs to parse things, like .cabal files and<br>

currently we do not have a decent parser in the core libraries. By<br>

decent I mean one that can produce error messages with source locations<br>

and that doesn&#39;t have unpredictable memory use. The only parser in the<br>

core libraries at the moment is Text.ParserCombinators.ReadP from the<br>

base package and that fails my &quot;decent&quot; criteria on both counts. Its<br>

idea of an error message is (), and on some largish .cabal files we take<br>

100s of MB to parse (I realise that the ReadP in the base package is a<br>

cutdown version so I don&#39;t mean to malign all ReadP-style libs out<br>

there).<br>

<br>

Partly due to the performance problem, the terrible .cabal file error<br>

messages, and partly because Doaitse Swierstra keeps asking me if .cabal<br>

files have a grammar, I&#39;ve been writing a new .cabal parser. It uses an<br>

alex lexer and a parsec parser. It&#39;s fast and the error messages are<br>

pretty good. I have reverse engineered a grammar that closely matches<br>

the existing parser and .cabal files in the wild, though I&#39;m not sure<br>

Doaitse will be satisfied with the approach I&#39;ve taken to handling<br>

layout.<br>

<br>

Why did I choose parsec? Practicality dictates that I can only use<br>

things in the core libraries, and the nearest thing we have to that is<br>

the parser lib that is in the HP. I tried to use happy but I could not<br>

construct a grammar/lexer combo to handle the layout (also, happy is not<br>

exactly known for its great error messages).<br></blockquote><div><br></div><div>Failed attempt aside for a moment, I think you should reconsider happy. Can you learn how to do layout from reading the GHC source? The happy documentation that explains how to attach a monad (you could use it to communicate between alex and happy for layout info) is a bit misleading but I have examples I can share with you. I haven&#39;t specifically tackled the layout problem but I could try to make a parser if it would help.</div>

<div><br></div><div>One major benefit of using happy is that the productions of the grammar can be analyzed for shift/shift and shift/reduce conflicts. The equivalent analysis doesn&#39;t appear to be possible in parsec. In theory, applicative parsers should allow for this but my understanding is that parsec does not have this feature for its applicative subset.</div>

<div><br></div><div>Other benefits are: a) GHC can certainly use parers generated by it, b) the generated code uses common dependencies, c) it&#39;s fast, d) it&#39;s expressive.</div><div><br></div><div>What is it about happy parser errors that you don&#39;t like? Do you know examples where parsec does a better job?</div>

<div><br></div><div>I have an alex + happy parser for a tiny functional language that I can share with you if you&#39;d like to give it another go. It doesn&#39;t support layout at the moment, but I think I could add that.</div>

<div><br></div><div>Jason</div></div>