replacing guile with haskell?

Thu Oct 23 22:58:47 EDT 2003

G'day all.

Quoting Graham Klyne <GK at ninebynine.org>:

> Thanks for this... it turns out I have an immediate use for it.  I've
> downloaded it but I'm having a little trouble figuring out how to drive
> it.  Is there any description anywhere?

The code isn't enough? :-)

> + The supplied regex is a value of type Re t, which describes an expression
> to match a sequence of tokens of type t.  In common use, t might be Char.

Right.

(One of the reasons that I wrote it was to experiment with QLR(k) parsing,
which occasionally requires traversals of the state stack.  In that case,
t is set of LR(0) states which correspond to the start of a production.)

> + The constructors for Re are:
>      ReOr [Re t]    -- matches any one of the Re's in the list
>    | ReCat [Re t]   -- matches a sequence of the RE's in the list
>    | ReStar (Re t)  -- matches zero or more of the given Re
>    | RePlus (Re t)  -- matches one or more of the given Re
>    | ReOpt (Re t)   -- matches zero or one of the given Re
>    | ReTerm [t]     -- matches a sequence of tokens exactly matching
>                        the given list.
> 
> That last is very much guesswork:  is it true that
>      ReTerm ts = ReCat $ map (\t ReTerm [t]) ts
> ?

Yes.  The idea is that Re Char is often used to match Strings, and it's
much more convenient to say (ReTerm "hello world") rather than the
alternative.

> A function to construct an (Re Char) from a simple textual representation
> would be handy.  Maybe I'll tackle that.

By all means.  Do you have a sourceforge account?  If so, I can easily
arrange checkin rights.  Better to call it something other than Dfa.

> Having got an Re value, matchRe is a function that applies it (after
> compilation) to a sequence of 't', returning True if the expression is
> matched, otherwise False.
> 
> Is this about right?

Yes.

> There's another function matchRe2, which seems to do something recursive
> with the regular expression but I can't figure out what.  Is it just a
> different implementation strategy?  It does seem to give the same answers.

Yes, it's the same.  matchRe2 is for testing the code generation phase.
It does the same thing as matchRe, only it interprets the internal
DFA representation instead of using going all the way to compiled code.

> I also noticed this comment in the code:
> [[
> Utility typeclasses for enforcing all the constraints we need
> on our monad's free type variables.  Note that this requires both
> -fglasgow-exts and -fallow-undecidable-instances in GHC to work
> properly.
> ]]

The code in question is this:

        class (Monad m, Ord t) => ReVars m t where { }
        instance (Monad m, Ord t) => ReVars m t where { }

You might be able to avoid -fallow-undecidable-instances if you instead
change it to this:

        class (Monad m, Ord t) => ReVars m t
        instance (Monad m, Ord t) => ReVars m t

Personally, I think we need a better way to do typeclass synonyms.

> I'm also wondering if there's a way to use this to match a leading
> subsequence (rather than the entire sequence of tokens supplied), and to
> discover which parts of the regexp have been matched.

Yes, but you'll have to write it. :-)

You might want to modify matchRecursiveDfa rather than hacking the code
generator, because that's going to be much easier to understand.  Also
note that if you want to apply the "longest match" rule, there will need
to be support for backtracking.

The Knuth-Morris-Pratt matcher on the wiki might give you some ideas on
how to do this.

Cheers,
Andrew Bromage