[Haskell] Re: ANN: TextRegexLazy 0.44

Chris Kuklewicz haskell at list.mightyreason.com
Mon Jul 24 09:59:33 EDT 2006


John Meacham wrote:
 > On Fri, Jul 14, 2006 at 05:05:50PM +0100, Chris Kuklewicz wrote:
 >> As a user, the JRegex API can also only support a single Regex type and a
 >> single backend.  But it would be really handy to be able to use different
 >> types of regular expressions.  Mainly there are going to be different regex
 >> syntax possibilities:
 >
 > This isn't true, the API is a class, you can create as many instances as
 > you like for it. In fact, it comes with at least 2 back ends, and at
 > least a couple different instances for the regex syntax. It was
 > specifically designed as a framework for many regular expression
 > backends to be used via a common and useful interface.
 >
 >         John
 >

JRegex does require the source too be a list [x]:

 > class RegexContext x a where
 >     (=~) :: RegexLike r x => [x] -> r -> a
 >     (=~~) :: (Monad m, RegexLike r x) => [x] -> r -> m a
 >
 > class RegexLike r a | r -> a where
 >     matchTest :: r -> [a] -> Bool
 >     matchCount :: r -> [a] -> Int
 >     matchAll  :: r -> [a] -> [(Array Int (Int,Int))]
 >     matchOnce :: r -> [a] -> Bool -> Maybe (Array Int (Int,Int))

The List requirement precludes a ByteString instance.  The functional dependency 
"r->a" also prevents mixing different backends with different data source types.

The Bool parameter to matchOnce is there so matchAll can be implemented in terms 
of matchOnce, exploiting the fact that the source data type is a list.  (Though 
this is not very optimal compared to a specialized matchAll).

I am done rewriting the Posix regex and PCRE code with both String and 
ByteString as instances.  The latest type classes (from today) look like:

 > type MatchArray = Array Int (Int,Int) -- (starting index,length)
 >
 > class (RegexOptions regex compOpt execOpt) => RegexMaker regex source where
 >   makeRegex :: source -> regex
 >   makeRegexOpts :: compOpt -> execOpt -> source -> regex
 >
 > class RegexLike regex source where
 >   matchAll :: regex -> source -> [MatchArray]
 >   matchCount :: regex -> source -> Int
 >   matchOnce :: regex -> source -> Maybe MatchArray
 >   matchTest :: regex -> source -> Bool
 >   matchTest regex source = isJust (matchOnce r s)
 >   matchCount regex source = length (matchAll r s)

I have omitted the RegexOptions class for space (  The job of the "Bool" to 
matchOnce is subsumed by the more general execOpt handling ).  Clearly I have 
taken the names and most of the types from JRegex.  I don't have the cool 
polymorphic RegexContext yet, but that is the next step.

Once the code stabilizes at all, I will post a link to the development darcs 
address.

The flexibility of source data type and backend is provided by making 
WrapPosix.hsc and WrapPRCE.hsc modules that expect a source type of 
CString/CStringLen and are comprehensive enough so that the four files 
(Byte)?String(Posix|PCRE) are just .hs files (no -cpp needed) instead of a .hsc 
files.  And these four use optimized routines for match(All|Count|Once|Test) 
instead of using either of the defaults.

The next backend to make instances for will be my "Text.Regex.Lazy" one based on 
Parsec.  Then I will have 3 backends and two data source types, making 6 
combinations.  For example: I can compile a String regex as a PCRE and match 
that against a ByteString and against a String.  The type of the regex source 
and the type of the data are separate.  And I can make a new Regex from an old 
one with different execution options.

-- 
Chris



More information about the Haskell mailing list