Fast Haskell Parser

Simon Marlow marlowsd at gmail.com
Thu Mar 11 05:28:44 EST 2010


On 11/03/2010 01:25, John D. Earle wrote:
> Hi, Ben! Thanks for the input. I went to the Parsec and Attoparsec
> parser links. Attoparsec was new to me. From the Parsec link:
>
> Combinator parsers are written and used within the same programming
> language as the rest of the program. The parsers are first-class
> citizens of the language, unlike Happy parsers, which must be generated
> via a preprocessor.
>
> End Quote
>
> I may want to go with Parsec or cousin just because the approach seems
> elegant. What I am fishing for is something you only learn from
> experience or learn from talking to people who have experience. My
> impression is if you want speed, Happy is the way to go. I will be
> making an investment in one or the other paradigm so I need to
> understand the relative merits of each in order to make a decision.
>
> I would think Happy is a dark place whereas Parsec is a place of light
> which becomes important as correctness that I can personally attest to
> is concerned. What I am getting at is my impression is with Parsec the
> concepts involved are more important than the actual code itself. I
> suspect with Happy you could understand the concepts involved and the
> tool will continue to be necessary. I value the educational experience.
> Parsec or cousin may provide a better educational experience. Is Parsec
> slow like a snail compared to Happy? or are they similar? Parsec
> certainly seems more flexible, but my question concerns performance. You
> know the embarrassing facts that you won't find in the brochure.

Happy/Alex tend to work well for programming-language type tasks where 
there is a clear lexical syntax and the grammar is close to LALR(1). 
Parsec works well for more ad-hoc grammars, and where there is 
ambiguity; Parsec tends to be more flexible in that regard.  Happy on 
the other hand will guarantee that your grammar has no ambiguity, and 
Alex will be more efficient than parsing the equivalent regular 
expressions in a combinator library because it is doing the NFA->DFA 
conversion beforehand (although IIRC the Utrecht combinator library does 
this at runtime?).

Personally I find Parsec and the other parser combinator libraries quite 
difficult to use when it comes to deciding where to put 'try'; ReadP is 
the exception here, because it does general backtracking and doesn't 
make you decide where to use 'try'.  I had an interesting experience 
with writing GHC's parser for the strings inside foreign import 
declarations recently - I tried various different combinator libraries 
to see how it came out, and none of them made it easy.  I must write a 
blog post about that sometime.

As for performance, I don't think anyone has done a rigorous comparison 
- it's hard to do, becuase you have to implement a whole parser twice in 
very different ways, and then you only get results for one grammar. 
Alex and Happy are fast enough for GHC - parsing is almost never the 
bottleneck.

Cheers,
	Simon

> Thank you this has helped clarify my thinking.
>
> --------------------------------------------------
> From: "Ben Lippmeier" <benl at ouroborus.net>
> Sent: 10 Wednesday March 2010 1734
> To: "John D. Earle" <JohnDEarle at cox.net>
> Cc: <haskell-cafe at haskell.org>
> Subject: Re: Fast Haskell Parser
>
>>
>> Hi John,
>> Doing a Google search for "haskell parser" returns the following link
>> as its first result. That's the parser that GHC uses.
>>
>> http://www.haskell.org/happy/
>>
>> You could also check out the following:
>>
>> http://www.haskell.org/haskellwiki/Parsec
>> http://hackage.haskell.org/package/attoparsec
>>
>> This would also be a perfect question to ask on the haskell-cafe
>> mailing list...
>>
>> Cheers,
>> Ben.
>>
>>
>> On 11/03/2010, at 10:39 AM, John D. Earle wrote:
>>
>>> I was thinking of ways to create an efficient Haskell parser. My
>>> initial thinking was to write a top down parser in Haskell, but if
>>> you want speed a table driven approach may make greater sense.
>>>
>>> Due to the existence of build bots there is a certain degree of
>>> compliancy concerning build times. I feel that convenience is an
>>> important factor. It should be convenient to build the source. Build
>>> bots make an assumption, namely the existence of a formal
>>> infrastructure. I believe that it should be possible to build
>>> something from source casually.
>>>
>>> This is a less demanding goal than high performance incremental
>>> builds. It would be nice to out perform make files because if you
>>> fail to do this, can it really be said that you are making progress?
>>> Correctness appears to be a low priority among computer programmers.
>>> That said, it may be worth investing some time in advance to figuring
>>> out how to best achieve both objectives, namely correctness and
>>> performance. Who knows skills acquired in one project may be useful
>>> in another and performance is usually welcome.
>>>
>>> So my question is, What sort of tools and methodologies exist in
>>> Haskell to create high performance parsers? My impression is the
>>> speed at which the parser performs its task is not the bottle-neck,
>>> but the parser might as well be designed to be efficient so as not to
>>> be intellectually lazy. It may even turn out that the parser may need
>>> to be efficient merely to compensate for the spawn of correctness,
>>> namely slow builds.
>>> _______________________________________________
>>> Cvs-ghc mailing list
>>> Cvs-ghc at haskell.org
>>> http://www.haskell.org/mailman/listinfo/cvs-ghc
>>
>
> _______________________________________________
> Cvs-ghc mailing list
> Cvs-ghc at haskell.org
> http://www.haskell.org/mailman/listinfo/cvs-ghc



More information about the Cvs-ghc mailing list