5.2. Basic interface

If you compile your Alex file without a %wrapper declaration, then you get access to the lowest-level API to the lexer. You must provide definitions for the following, either in the same module or imported from another module:

type AlexInput
alexGetByte       :: AlexInput -> Maybe (Word8,AlexInput)
alexInputPrevChar :: AlexInput -> Char

The generated lexer is independent of the input type, which is why you have to provide a definition for the input type yourself. Note that the input type needs to keep track of the previous character in the input stream; this is used for implementing patterns with a left-context (those that begin with ^ or set^). If you don't ever use patterns with a left-context in your lexical specification, then you can safely forget about the previous character in the input stream, and have alexInputPrevChar return undefined.

Alex will provide the following function:

alexScan :: AlexInput             -- The current input
         -> Int                   -- The "start code"
         -> AlexReturn action     -- The return value

data AlexReturn action
  = AlexEOF

  | AlexError
      !AlexInput     -- Remaining input

  | AlexSkip
      !AlexInput     -- Remaining input
      !Int           -- Token length

  | AlexToken  
      !AlexInput     -- Remaining input
      !Int           -- Token length
      action         -- action value

Calling alexScan will scan a single token from the input stream, and return a value of type AlexReturn. The value returned is either:

AlexEOF

The end-of-file was reached.

AlexError

A valid token could not be recognised.

AlexSkip

The matched token did not have an action associated with it.

AlexToken

A token was matched, and the action associated with it is returned.

The action is simply the value of the expression inside {...} on the right-hand-side of the appropriate rule in the Alex file. Alex doesn't specify what type these expressions should have, it simply requires that they all have the same type, or else you'll get a type error when you try to compile the generated lexer.

Once you have the action, it is up to you what to do with it. The type of action could be a function which takes the String representation of the token and returns a value in some token type, or it could be a continuation that takes the new input and calls alexScan again, building a list of tokens as it goes.

This is pretty low-level stuff; you have complete flexibility about how you use the lexer, but there might be a fair amount of support code to write before you can actually use it. For this reason, we also provide a selection of wrappers that add some common functionality to this basic scheme. Wrappers are described in the next section.

There is another entry point, which is useful if your grammar contains any predicates (see Section 3.2.2.1, “Contexts”):

alexScanUser
         :: user             -- predicate state
         -> AlexInput        -- The current input
         -> Int              -- The "start code"
         -> Maybe (          -- Nothing on error or EOF
                 AlexInput,  -- The remaining input
                 Int,        -- Length of this token
                 action      -- The action (an unknown type)
              )

The extra argument, of some type user, is passed to each predicate.