Go to the first, previous, next, last section, table of contents.

The `Regexp' and `MatchPS' interfaces

(Sigbjorn Finne supplied the regular-expressions interface.)

The `Regex' library provides quite direct interface to the GNU regular-expression library, for doing manipulation on `_PackedString's. You probably need to see the GNU documentation if you are operating at this level.

The datatypes and functions that `Regex' provides are:

data PatBuffer  # just a bunch of bytes (mutable)

data REmatch
 = REmatch (Array Int GroupBounds)  -- for $1, ... $n
           GroupBounds              -- for $` (everything before match)
           GroupBounds              -- for $& (entire matched string)
           GroupBounds              -- for $' (everything after)
           GroupBounds              -- for $+ (matched by last bracket)

-- GroupBounds hold the interval where a group
-- matched inside a string, e.g.
--
-- matching "reg(exp)" "a regexp" returns the pair (5,7) for the
-- (exp) group. (_PackedString indices start from 0)

type GroupBounds = (Int, Int)

re_compile_pattern
        :: _PackedString        -- pattern to compile
        -> Bool                 -- True <=> assume single-line mode
        -> Bool                 -- True <=> case-insensitive
        -> PrimIO PatBuffer

re_match :: PatBuffer           -- compiled regexp
         -> _PackedString       -- string to match
         -> Int                 -- start position
         -> Bool                -- True <=> record results in registers
         -> PrimIO (Maybe REmatch)

-- Matching on 2 strings is useful when you're dealing with multiple
-- buffers, which is something that could prove useful for
-- PackedStrings, as we don't want to stuff the contents of a file
-- into one massive heap chunk, but load (smaller chunks) on demand.

re_match2 :: PatBuffer          -- 2-string version
          -> _PackedString
          -> _PackedString
          -> Int
          -> Int
          -> Bool
          -> PrimIO (Maybe REmatch)

re_search :: PatBuffer          -- compiled regexp
          -> _PackedString      -- string to search
          -> Int                -- start index
          -> Int                -- stop index
          -> Bool               -- True <=> record results in registers
          -> PrimIO (Maybe REmatch)

re_search2 :: PatBuffer         -- Double buffer search
           -> _PackedString
           -> _PackedString
           -> Int               -- start index
           -> Int               -- range (?)
           -> Int               -- stop index
           -> Bool              -- True <=> results in registers
           -> PrimIO (Maybe REmatch)

The `MatchPS' module provides Perl-like "higher-level" facilities to operate on `_PackedStrings'. The regular expressions in question are in Perl syntax. The "flags" on various functions can include: `i' for case-insensitive, `s' for single-line mode, and `g' for global. (It's probably worth your time to peruse the source code...)

matchPS :: _PackedString    -- regexp
        -> _PackedString    -- string to match
        -> [Char]           -- flags
        -> Maybe REmatch    -- info about what matched and where

searchPS :: _PackedString   -- regexp
         -> _PackedString   -- string to match
         -> [Char]          -- flags
         -> Maybe REmatch

-- Perl-like match-and-substitute:
substPS :: _PackedString    -- regexp
        -> _PackedString    -- replacement
        -> [Char]           -- flags
        -> _PackedString    -- string
        -> _PackedString

-- same as substPS, but no prefix and suffix:
replacePS :: _PackedString  -- regexp
          -> _PackedString  -- replacement
          -> [Char]         -- flags
          -> _PackedString  -- string
          -> _PackedString

match2PS :: _PackedString   -- regexp
         -> _PackedString   -- string1 to match
         -> _PackedString   -- string2 to match
         -> [Char]          -- flags
         -> Maybe REmatch

search2PS :: _PackedString  -- regexp
          -> _PackedString  -- string to match
          -> _PackedString  -- string to match
          -> [Char]         -- flags
          -> Maybe REmatch

-- functions to pull the matched pieces out of an REmatch:

getMatchesNo    :: REmatch -> Int
getMatchedGroup :: REmatch -> Int -> _PackedString -> _PackedString
getWholeMatch   :: REmatch -> _PackedString -> _PackedString
getLastMatch    :: REmatch -> _PackedString -> _PackedString
getAfterMatch   :: REmatch -> _PackedString -> _PackedString

-- (reverse) brute-force string matching;
-- Perl equivalent is index/rindex:
findPS, rfindPS :: _PackedString -> _PackedString -> Maybe Int

-- Equivalent to Perl "chop" (off the last character, if any):
chopPS :: _PackedString -> _PackedString

-- matchPrefixPS: tries to match as much as possible of strA starting
-- from the beginning of strB (handy when matching fancy literals in
-- parsers):
matchPrefixPS :: _PackedString -> _PackedString -> Int


Go to the first, previous, next, last section, table of contents.