[Haskell-cafe] Overlaping Parsec rules

Christian Maeder Christian.Maeder at dfki.de
Mon Mar 7 12:22:41 CET 2011


You should parse keywords using:

keyword s = try (string s) >> notFollowedBy (letter <|> digit)

C.

Am 07.03.2011 11:34, schrieb Hauschild, Klaus (EXT):
> Hi,
>  
> to solve this ICFP task _http://www.cs.cornell.edu/icfp/task.htm_ I'm
> currnetly working on the parser. With the hint from Thu (reading Phillip
> Wadlers monadic parser paper) and consulting
> _http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours/Parsing_
> I produce a first working version of the parser.
> After this great moment I completed the token definition and near all
> parsing rules. For the complete code have a look at
> _http://code.google.com/p/hgmltracer/source/browse/#svn%2Ftrunk%2FhGmlTracer_
>  
> data GmlToken =
>        
>         -- structures
>        
>         FunctionToken TokenSequence     |
>        
>         ArrayToken TokenSequence        |
>        
>         -- control operators
>        
>         ApplyToken                      |
>        
>         IfToken                         |
>        
>         -- number operators
>        
>         AddiToken                       |
>        
>         AddfToken                       |
>        
>         ACosToken                       |
>        
>         ASinToken                       |
>        
>         ClampfToken                     |
>        
>         CosToken                        |
>        
>         DiviToken                       |
>        
>         DivfToken                       |
>        
>         EqiToken                        |
>        
>         EqfToken                        |
>        
>         FloorToken                      |
>        
>         FracToken                       |
>        
>         LessiToken                      |
>        
>         LessfToken                      |
>        
>         ModiToken                       |
>        
>         MuliToken                       |
>        
>         MulfToken                       |
>        
>         NegiToken                       |
>        
>         NegfToken                       |
>        
>         ToRealToken                     |
>        
>         SinToken                        |
>        
>         SqrtToken                       |
>        
>         SubiToken                       |
>        
>         SubfToken                       |
>        
>         -- points
>        
>         GetXToken                       |
>        
>         GetYToken                       |
>        
>         GetZToken                       |
>        
>         PointToken                      |
>        
>         -- arrays
>        
>         GetToken                        |
>        
>         LengthToken                     |
>        
>         -- environment
>        
>         IdentifierToken String          |
>        
>         BinderToken String              |
>        
>         -- types
>        
>         BoolToken Bool                  |
>        
>         IntToken Int                    |
>        
>         RealToken Double                |
>        
>         StringToken String              deriving Show  
>  
> And
>  
> parseGml :: String -> [GmlToken]
>        
> parseGml input = case parse parseList "gml" input of
>        
>     Left err -> error ("Parse error: " ++ (show err))
>        
>     Right gml -> gml
>        
> 
>        
> parseList = sepBy parseGml' spaces
>        
> 
>        
> parseGml' =
>        
>         -- structures
>        
> --        parseFunction
>        
> --    <|> parseArray
>        
>         -- control operators
>        
>         parseControlOperators
>        
>         -- number operators
>        
>     <|> parseNumberOperators
>        
>         -- points
>        
>     <|> parsePointOperators
>        
>         -- arrays
>        
>     <|> parseArrayOperators
>        
>         -- types
>        
>     <|> parseBool
>        
>     <|> parseInteger
>        
>     <|> parseString
>        
>         -- environment
>        
>     <|> parseIdentifier
>        
>     <|> parseBinder
>        
> 
>        
> parseArray = parseSequence '[' ']'
>        
> parseFunction = parseSequence '{' '}'
>        
> parseSequence start end = do char start
>        
>                              array <- parseList
>        
>                              char end
>        
>                              return $ ArrayToken array
>        
> 
>        
> parseControlOperators = parseApply <|> parseIf
>        
> parseApply = do string "apply"
>        
>                 return $ ApplyToken
>        
> parseIf = do string "if"
>        
>              return $ IfToken
>        
> 
>        
> parseNumberOperators = do string "addi"
>        
>                           return $ AddiToken
>        
>                    <|> do string "addf"
>        
>                           return $ AddfToken
>        
>                    <|> do string "acos"
>        
>                           return $ ACosToken
>        
>                    <|> do string "asind"
>        
>                           return $ ASinToken
>        
>                    <|> do string "clampf"
>        
>                           return $ ClampfToken
>        
>                    <|> do string "cos"
>        
>                           return $ CosToken
>        
>                    <|> do string "divi"
>        
>                           return $ DiviToken
>        
>                    <|> do string "divf"
>        
>                           return $ DivfToken
>        
>                    <|> do string "eqi"
>        
>                           return $ EqiToken
>        
>                    <|> do string "eqf"
>        
>                           return $ EqfToken
>        
>                    <|> do string "floor"
>        
>                           return $ FloorToken
>        
>                    <|> do string "frac"
>        
>                           return $ FracToken
>        
>                    <|> do string "lessi"
>        
>                           return $ LessiToken
>        
>                    <|> do string "lessf"
>        
>                           return $ LessfToken
>        
>                    <|> do string "modi"
>        
>                           return $ ModiToken
>        
>                    <|> do string "muli"
>        
>                           return $ MuliToken
>        
>                    <|> do string "mulf"
>        
>                           return $ MulfToken
>        
>                    <|> do string "negi"
>        
>                           return $ NegiToken
>        
>                    <|> do string "negf"
>        
>                           return $ NegfToken
>        
>                    <|> do string "real"
>        
>                           return $ ToRealToken
>        
>                    <|> do string "sin"
>        
>                           return $ SinToken
>        
>                    <|> do string "Sqrt"
>        
>                           return $ SqrtToken
>        
>                    <|> do string "subi"
>        
>                           return $ SubiToken
>        
>                    <|> do string "subf"
>        
>                           return $ SubfToken
>        
> 
>        
> parsePointOperators = do string "getx"
>        
>                          return $ GetXToken
>        
>                   <|> do string "gety"
>        
>                          return $ GetYToken
>        
>                   <|> do string "getz"
>        
>                          return $ GetZToken
>        
>                   <|> do string "point"
>        
>                          return $ PointToken
>        
> 
>        
> parseArrayOperators = do string "get"
>        
>                          return $ GetToken
>        
>                   <|> do string "length"
>        
>                          return $ LengthToken
>        
> 
>        
> parseInteger = liftM (IntToken . read) $ many1 digit
>        
> parseBool = do string "true"
>        
>                return $ BoolToken True
>        
>         <|> do string "false"
>        
>                return $ BoolToken False
>        
> parseString = do char '"'
>        
>                  string <- many (noneOf "\"")
>        
>                  char '"'
>        
>                  return $ StringToken string
>        
> 
>        
> parseIdentifier = do identifier <- many (noneOf " ")
>        
>                      return $ IdentifierToken identifier
>        
> parseBinder = do char '/'
>        
>                  binder <- many (noneOf " ")
>        
>                  return $ BinderToken binder
>        
> 
>        
> parseComment = do char '%'
>        
>                   many (noneOf "")
>        
>                   newline
>        
>                   return $ ()
>        
> 
>        
> spaces = skipMany1 space       
>  
>  
> After gluing all this together in my mind all worked well. But it
> doesn't. The test file for parsing looks like:
>  
> { /self /n
>   n 2 lessi
>   { 1 }
>   { n 1 subi self self apply n muli }
>   if
> } /fact
>  
> 12 fact fact apply
>  
> 
>     * I think there is a problem with overlaping rules. There is a
>       parser rule consuming "negi" and resulting in the NegiToken. A
>       single "n" is a valid identifier. For the example file my parser
>       says: unexpected " ", expecting "negi"
>     * I think the same problem is present for "parseInteger" and
>       "parseReal" (currently no in code but looks like "parseReal = do a
>       <- many1 digit \n char '.' \n b <- many1 digit \n return $
>       RealToken (read (a ++ "." ++ b)"
>     * Something with "parseFunction" is going really wrong.
>     * All parsig rules are designed with the condition that each
>       "construct" is separated by whitespaces. For this before parsing
>       the input will be preprocessed: insert spaces, removing
>       whitespaces and so on. Now in the parsed result appears
>       (IdentifierToken ""). I think my version is not the best way to
>       parse a identifer:
> 
> parseIdentifier = *do* identifier <- many (noneOf " ")
>                      return $ *IdentifierToken* identifier
>  
> Please help me.
>  
> Klaus
>  
> 
> 
> 
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe



More information about the Haskell-Cafe mailing list