CSV parser, quotes?

Graham Klyne GK@ninebynine.org
Wed, 30 Jul 2003 21:56:19 +0100


The Parsec library code contains string-parsing (with escapes) that you 
might be able to use or canibalize.  See 'StringLiteral' et seq in 
ParsecToken.hs.  The escape handling there is a fair amount of code, bit it 
covers a lot of escape options.

#g
--

At 14:54 30/07/03 -0500, Shawn P. Garbett wrote:
>In attempting to improve the CSV parser based on comments, I have the
>following code that's attached. I'm having a heck of a time getting the
>double quotes = an escaped quote thing to work. There is some commented out
>code which was my last attempt. As it stands the code works, minus the
>escaped quotes. Remove the comments and it just hangs.
>
>module CSV (contents, csv) where
>
>import Parsec
>
>----------------------------------------------------------------------
>-- CSV Module
>----------------------------------------------------------------------
>
>-- Useful common parsers
>comma :: Parser Char
>comma  = char ','
>
>quote :: Parser Char
>quote  = char '\"'
>
>-- How to handle these buggers....
>--esc_quote :: Parser Char
>--esc_quote  = do {char '\"'; char '\"';}
>
>--text      :: Parser String
>--text       = do {esc_quote; t<-text; return ('\"':t)}
>--             <|> do {c <- noneOf "\""; t<-text; return (c:t) }
>--             <|> return ""
>
>-- A cell can be a quoted value, a number or empty
>-- Quotes can be embedded by using double quotes ""
>cell      :: Parser String
>cell       = between quote quote (many (noneOf "\"")) -- quoted values
>--cell       = between quote quote text -- quoted values
>              <|> many1 (noneOf "\",\n")               -- unquoted values
>              <|> return ""                            -- give up, Empty 
> cell
>
>-- Group of cells with a newline
>cells     :: Parser [String]
>cells      = do c <- sepBy cell comma
>                 newline
>                 return c
>
>-- Comma Separated Values, set of rows followed by eof
>csv       :: Parser [[String]]
>csv        = manyTill cells eof
>
>-- Useful For extracting comma delimited values of a cell
>contents  :: Parser [String]
>contents   = sepBy1 (many (noneOf ",")) comma
>
>_______________________________________________
>Haskell-Cafe mailing list
>Haskell-Cafe@haskell.org
>http://www.haskell.org/mailman/listinfo/haskell-cafe

-------------------
Graham Klyne
<GK@NineByNine.org>
PGP: 0FAA 69FF C083 000B A2E9  A131 01B9 1C7A DBCA CB5E