[Haskell-cafe] Regex API ideas

ChrisK haskell at list.mightyreason.com
Thu Nov 1 14:16:30 EDT 2007


Hi Bryan,

  I wrote the current regex API, so your suggestions are interesting to me.  The
also goes for anyone else's regex API opinions, of course.

Bryan O'Sullivan wrote:
> Ketil Malde wrote:
> 
>> Python used to do pretty well here compared
>> to Haskell, with rather efficient hashes and text parsing, although I
>> suspect ByteString IO and other optimizations may have changed that
>> now. 

> 
> It still does just fine.  For typical "munge a file with regexps, lists,
> and maps" tasks, Python and Perl remain on par with comparably written
> Haskell.  This because the scripting-level code acts as a thin layer of
> glue around I/O, regexps, lists, and dicts, all of which are written in
> native code.
> 
> The Haskell regexp libraries actually give us something of a leg down
> with respect to Python and Perl.

True, the pure Haskell library is not as fast as a C library.  In particular,
the current regex-tdfa handles lazy bytestring in a sub-optimal manner.  This
may eventually be fixed.

But the native code libraries have also been wrapped in the same API, and they
are quite fast when combined with strict ByteStrings.

> The aggressive use of polymorphism in
> the return type of (=~) makes it hard to remember which of the possible
> return types gives me what information.  Not only did I write a regexp
> tutorial to understand the API in the first place, I have to reread it
> every time I want to match a regexp.

The (=~) operator uses many return types provided by the instances of
RegexContext.  These are all thin wrappers around the unpolymorphic return types
of the RegexLike class.  So (=~) could be avoided altogether, or another API
created.

> 
> A suitable solution would be a return type of RegexpMatch a => Maybe a
> (to live alongside the existing types, but aiming to become the one
> that's easy to remember), with appropriate methods on a, but I don't
> have time to write up a patch.
> 
>     <b

The (=~~) is the monadic wrapper for (=~) to allow for different failure
behaviors.  So using (=~~) with Maybe is already possible, and gives Nothing
whenever there are zero matches.

But more interesting to me is learning what API you would like to see.
What would you like the code that uses the API to be?
Could you sketch either the definition or usage of your RegexMatch class suggestion?

I don't use my own regex API much, so external feedback and ideas would be
wonderful.

-- 
Chris



More information about the Haskell-Cafe mailing list