[Haskell-cafe] Parsec operator with letter problem

Daniel Fischer daniel.is.fischer at web.de
Fri Mar 31 08:24:51 EST 2006


Hi,

probably somebody else has already come up with something better, but still...

I surmise that you have two kinds of infix-operators, 
1. dot-like operators, made up entirely of symbols (^!$%&/\,.:;#+-~* ...)
2. LaTeX-command-like operators, starting with a backslash and then followed 
by a nonempty sequence of letters (or possibly alphanumeric characters).

Then the following helps:

import Data.Char (isAlpha)

lexer = lexer0{P.reservedOp = rOp}
          where
            lexer0      = P.makeTokenParser testdef
            resOp0      = P.reservedOp lexer0
            resOp1 name =
              case name of
                ('\\':cs@(_:_))
                    | all isAlpha cs -> do string name
                                           notFollowedBy letter <?>
                                                ("end of " ++ show name)
                _   -> fail (show name ++ " no good reservedOp")
            rOp name = lexeme $ try $ resOp0 name <|> resOp1 name
            lexeme p = do { x <- p; P.whiteSpace lexer0; return x }

testdef = emptyDef
    { P.identStart      = letter <|> char '_'
    , P.identLetter     = alphaNum <|> char '_'
    , P.opStart         = oneOf $ nub $
                            map (\s -> head s) $ P.reservedOpNames testdef
 -- , P.opLetter        = oneOf (concat (P.reservedOpNames testdef))
    , P.opLetter = oneOf symbs
    , P.reservedOpNames = [ ".", "\\in" ] }
    where
      symbs = filter (not . isAlpha) . concat $ P.reservedOpNames testdef
---------------------------------------------------------------------
dafis at linux:~/Documents/haskell/Reto> cat input
a.n
dafis at linux:~/Documents/haskell/Reto> reto input
InfixExpr OP_Dot (Ident "a") (Ident "n")

If you have more complicated infix operators (e.g. \foo#bar:, :ouch:),
it won't be so easy, anyway, you have to change the definition of reservedOp.

Cheers,
Daniel

Am Freitag, 31. März 2006 01:15 schrieb Reto Kramer:
> I ran into the following problem with parsec's handling of operators.
> I hope someone on the list can show me a trick that resolve my
> current issue. At the end of this message is the full code for the
> reproducer.
>
> The language I'm parsing has infix operators of two forms. Some are
> special characters (e.g. a dot) and some are LaTeX like (e.g. \in).
> The letters that appear in the e.g. \in lead to problems with
> expression parsing.
>
> The following is what I expect:
>    $ cat input1
>    a.b
>    $ ./test input1
>    InfixExpr OP_Dot (Ident "a") (Ident "b")
> Good!
>
> Then I try (change the b to an n)
>    $ cat input2
>    a.n
>    $ ./test input2
>    Ident "a"
> OUTSCH!  Changing the name of an identifier changed the expression!
>
> Then I try (add a space right after the dot)
>    $ cat input3
>    a. n
>    $ ./test input2
>    InfixExpr OP_Dot (Ident "a") (Ident "n")
> Good!
>
> What is going on?
>
> The 'n' is part of the "\in" operator (see reservedOpNames in test.hs
> program at the end of message) and confuses the parsec expression
> parser's ability to determine the end of expressions/identifiers.
> Adding a space right after the '.' operator resolves the issue, but
> that's not a suitable option for my users.
>
> Unfortunately I cannot remove the "\in" operator from the
> reservedOpNames list since otherwise "\in" is not recognized as an
> infix operator itself anymore.
>
> QUESTION:
> how can I get parsec's expression parser to work with infix operators
> of the form "\in"?
>
> Thanks,
> - Reto
>
>
> -- complete reproducer (test.hs)
> -- compile with: ghc -o test -package parsec test.hs
>
> module Main where
>
> import List (nub)
> import System (getArgs)
> import Text.ParserCombinators.Parsec
> import Text.ParserCombinators.Parsec.Expr
> import Text.ParserCombinators.Parsec.Language( emptyDef )
> import qualified Text.ParserCombinators.Parsec.Token as P
>
> main :: IO ()
> main = do{ args <- getArgs
> 	 ; let fname = args !! 0
> 	 ; input <- readFile fname
> 	 ; case parse spec fname input of
>               Left err -> do{ putStr "parse error at "
>                             ; print err
> 			   }
>               Right x  -> print x
> 	 }
>
> data Op = OP_Dot | OP_In deriving (Show, Eq, Ord)
> data Expr = InfixExpr Op Expr Expr
>
> 	  | Ident String
> 	  | Number Integer deriving (Show, Eq, Ord)
>
> spec :: Parser Expr
> spec = do { whiteSpace
> 	  ; e <- expression
> 	  ; return e
> 	  }
>
> expression :: Parser Expr
> expression = buildExpressionParser table basicExpr <?> "expression"
>
> op_infix :: Op -> Expr  -> Expr -> Expr
> op_infix op a b = InfixExpr op a b
>
> table :: OperatorTable Char () Expr
> table = [ [binary "."  (op_infix OP_Dot) AssocLeft]
>          , [binary "\\in" (op_infix OP_In) AssocLeft] ]
> binary  name fun assoc = Infix (do{ reservedOp name; return fun }) assoc
>
> basicExpr :: Parser Expr
> basicExpr = choice [
>                do{ i <- identifier
> 		; return $ Ident i
> 		}
>              ]
>
> lexer = P.makeTokenParser testdef
>
> testdef = emptyDef {
>    P.identStart      = letter <|> char '_'
> , P.identLetter     = alphaNum <|> char '_'
> , P.opStart         = oneOf $ nub $
>                          map (\s -> head s) $ P.reservedOpNames testdef
> , P.opLetter        = oneOf (concat (P.reservedOpNames testdef))
> , P.reservedOpNames = [ ".", "\\in" ] }
>
> dot             = P.dot lexer
> parens          = P.parens lexer
> braces          = P.braces lexer
> squares         = P.squares lexer
> semiSep         = P.semiSep lexer
> semiSep1        = P.semiSep1 lexer
> commaSep        = P.commaSep lexer
> commaSep1       = P.commaSep1 lexer
> brackets        = P.brackets lexer
> whiteSpace      = P.whiteSpace lexer
> symbol          = P.symbol lexer
> identifier      = P.identifier lexer
> reserved        = P.reserved lexer
> reservedOp      = P.reservedOp lexer
> integer         = P.integer lexer
> natural         = P.natural lexer
> charLiteral     = P.charLiteral lexer
> stringLiteral   = P.stringLiteral lexer
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe

-- 

"In My Egotistical Opinion, most people's C programs should be
indented six feet downward and covered with dirt."
	-- Blair P. Houghton



More information about the Haskell-Cafe mailing list