[Haskell-cafe] Token parsers in parsec consume trailing whitespace

Mon Dec 14 18:44:37 EST 2009

Hi Edward,

> 1. Is there a more elegant way of doing number parsing?  In
> particular, are there token parsers that don't consume trailing
> whitespace, or is there a better way to do this with the
> primitives.

Parsec defines a combinator it calls 'lexeme' which the tokenizer
wraps each of its functions in.  The purpose of the tokenizer is to
create a set of parsing combinators that ignore whitespace, comments,
and some other handy stuff like checking for collisions with reserved
keywords.  To consume the trailing whitespace is not a bug, it's an
abstraction layer, and Parsec is consistent about only using this
abstraction in the Token module.

It's too bad that the 'nat' function in Token is not defined in
Parsec's Char module, and because of that, you need to copy-paste that
code or roll your own.

> It seems that the "token" approach of parsing lends itself
> to a different style of parsing than the one I'm doing

That's correct.  Sounds to me like you shouldn't bother creating a
tokenizer.  You might even be able to get away with using the regex
library instead of Parsec.

-Greg