[Haskell-cafe] Re: checking regular expressions

ChrisK haskell at list.mightyreason.com
Fri Nov 9 18:22:15 EST 2007


Hi,
  I wrote the regex-base API you are looking at.

Uwe Schmidt wrote:
> Hi all,
> 
> what's the simplest way to check, whether a given string
> is a wellformed regular expression?

import Text.Regex.Posix.String(compile)
or
import Text.Regex.Posix.ByteString(compile)
etc..

> 
> In the API there's just a "mkRegex" which does not make
> any checks, and the "matchRegex" which throws an exception
> when the regex isn't wellformed.
> 
> So do I need the IO monad for checking a regex?

no.  You can use the 'compile' functions above with unsafePerformIO safely.
(see more detail below...)

> 
> Uwe

The error reporting in the RegexMaker type class is poor in the version you are
using.  Sorry about that.

There are two very workable solutions.  Use the actual function and avoid the
type class or upgrade regex-base (and regex-posix, etc) to the newer API.

The first to avoid the type class and use the particular function that it an
interface to.  These worker functions are all called "compile" and have sane
error handling.

The regex-posix backend has a 'compile' that works on String in
Text.Regex.Posix.String [1]
The regex-posix backend has a 'compile' that works on ByteString in
Text.Regex.Posix.ByteString [2]
Other backends and type are under the same organization scheme.

To illustrate the usage, take the Text.Regex.Posix.String function:

compile :: CompOption -> ExecOption -> String -> IO (Either WrapError Regex)

This technically calls into the C library and is tagged as IO.  The return type
gives the most informative error I could concoct form the posix backend.  This
compile function is hooked into the RegexMaker type class as:

> instance RegexMaker Regex CompOption ExecOption String where
>   makeRegexOpts c e pattern = unsafePerformIO $
>     (compile c e pattern >>= unwrap)
>   makeRegexOptsM c e pattern = either (fail.show) return $ unsafePerformIO $ 
>     (compile c e pattern)

And I do not think the IO is needed for safety, so unsafePerformIO is okay for
this case.  (But this depends on the intelligence of the c library to be thread
safe.  This is true on at least Mac OS X.).

The second solution is to update regex-base to the darcs version at
http://darcs.haskell.org/packages/regex-unstable/regex-base/
which has a newer RegexMaker API that adds what you are looking for:

>   -- | make using the defaultCompOpt and defaultExecOpt, reporting errors with fail
>   makeRegexM :: (Monad m) => source -> m regex
>   -- | Specify your own options, reporting errors with fail
>   makeRegexOptsM :: (Monad m) => compOpt -> execOpt -> source -> m regex

These function can be used with Monad Maybe to give Nothing if there is a
problem compiling the regular expression, regardless of back-end and source
type.  If you upgrade regex-base you will also need a newer backend, such as the
updated regex-posix from under http://darcs.haskell.org/packages/regex-unstable/

Cheers,
  Chris

[1]
http://www.haskell.org/ghc/docs/latest/html/libraries/regex-posix-0.72.0.1/Text-Regex-Posix-String.html#v%3Acompile
[2]
http://www.haskell.org/ghc/docs/latest/html/libraries/regex-posix-0.72.0.1/Text-Regex-Posix-ByteString.html#v%3Acompile



More information about the Haskell-Cafe mailing list