[Haskell-cafe] Lazy Parsing (ANN: vcd-0.1.4)

Tue Apr 27 07:23:10 EDT 2010

I had been using Parsec to parse VCD files, but needed to lazily parse
streaming data.  After stumbling on this thread below, I switch to
polyparse.

What a great library!  I was able to migrate from a strict to a
semi-lazy parser and many of my parse reductions didn't even need to
change.  Thanks Malcolm!

In addition to lazy VCD parsing, this version of vcd [1] also includes
step', which forces a step regardless if variables have changed or not
-- helpful for realtime simulation.

(BTW, parsec is a great library too.)

-Tom

[1] http://hackage.haskell.org/package/vcd-0.1.4

On Sun, May 31, 2009 at 6:41 AM, Malcolm Wallace
<malcolm.wallace at cs.york.ac.uk> wrote:
>
> I don't know whether you will be willing to change over to polyparse
> library, but here are some hints about how you might use it.
>
> Given that you want the input to be a simple character stream, rather than
> use a more elaborate lexer, the first thing to do is to specialise the
> parser type for your purposes:
>
>> type TextParser a = Parser Char a
>
> Now, to recognise a "mere digit",
>
>> digit :: TextParser Char
>> digit = satisfy Char.isDigit
>
> and for a sequence of digits forming an unsigned integer:
>
>> integer :: TextParser Integer
>> integer = do ds <- many1 digit
>>              return (foldl1 (\n d-> n*10+d)
>>                             (map (fromIntegral.digitToInt) ds))
>>           `adjustErr` (++("expected one or more digits"))
>
>> I mean I'd like to be able to turn "12.05.2009" into something like (12,
>> 5, 2009) and got no clue what the code would have to look like. I do know
>> almost every variation what the code must not look like :).
>
>> date = do a <- integer
>>           satisfy (=='.')
>>           b <- integer
>>           satisfy (=='.')
>>           c <- integer
>>           return (a,b,c)
>
> Of course, that is just the standard (strict) monadic interface used by many
> combinator libraries.  Your original desire was for lazy parsing, and to
> achieve that, you must move over to the applicative interface.  The key
> difference is that you cannot name intermediate values, but must construct
> larger values directly from smaller ones by something like function
> application.
>
>> lazydate = return (,,) `apply` integer `discard` dot
>>                        `apply` integer `discard` dot
>>                        `apply` integer
>>    where dot = satisfy (=='.')
>
> The (,,) is the constructor function for triples.  The `discard` combinator
> ensures that its second argument parses OK, but throws away its result,
> keeping only the result of its first argument.
>
> Apart from lazy space behaviour, the main observable difference between
> "date" and "lazydate" is when errors are reported on incorrect input.  For
> instance:
>
>  > fst $ runParser date "12.05..2009"
>  *** Exception: In a sequence:
>  Parse.satisfy: failed
>  expected one or more digits
>
>  > fst $ runParser lazydate "12.05..2009"
>  (12,5,*** Exception: In a sequence:
>  Parse.satisfy: failed
>  expected one or more digits
>
> Notice how the lazy parser managed to build the first two elements of the
> triple, whilst the strict parser gave no value at all.
>
> I know that the error messages shown here are not entirely satisfactory, but
> they can be improved significantly just by making greater use of the
> `adjustErr` combinator in lots more places (it is rather like Parsec's <?>).
>  Errors containing positional information about the input can be constructed
> by introducing a separate lexical tokenizer, which is also not difficult.
>
> Regards,
>    Malcolm
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>