[Haskell-cafe] simple parsec question

Immanuel Normann immanuel.normann at googlemail.com
Tue Mar 5 09:50:03 CET 2013


Carlo,

Thanks a lot! This looks very promising (though I have to test it for my
purpose more in depth). As you mention, the key seems to be the optionMaybe
combinator. Thanks for pointing to it.

Immanuel


2013/3/5 Carlo Hamalainen <carlo.hamalainen at gmail.com>

> On Mon, Mar 4, 2013 at 1:44 AM, Immanuel Normann <
> immanuel.normann at googlemail.com> wrote:
>
>> I am trying to parse a semi structured text with parsec that basically
>> should identify sections. Each section starts with a headline and has an
>> unstructured content - that's all.
>>
>
> Here's my attempt: https://gist.github.com/carlohamalainen/5087207
>
> {-# LANGUAGE FlexibleContexts #-}
>
> import Text.Parsec
> import Control.Applicative hiding ((<|>),many)
>
> -- Example input:
>
> {-
> top 1:
>
> some text ... bla
>
> top 2:
>
> more text ... bla bla
>
> -}
>
> data Top = Top String deriving (Show)
> data Content = Content [String] deriving (Show)
> data Section = Section Top Content deriving (Show)
>
> headline = do
>     t <- many1 (noneOf ":\n")
>     char ':'
>     newline
>
>     return $ Top t
>
> contentLine = do
>     x <- many (noneOf ":\n")
>     newline
>     return x
>
> content = do
>     line <- optionMaybe (try contentLine)
>
>     case line of Just x -> do xs <- content
>                               return (x:xs)
>                  _      -> return []
>
> section = do
>     h <- headline
>     c <- Content <$> content
>     return $ Section h c
>
> main = do
>     x <- readFile "simple.txt"
>     print $ parse (many section) "" x
>
>
> Example run using your sample data:
>
> $ runhaskell Simple.hs
> Right [Section (Top "top 1") (Content ["","some text ... bla",""]),Section
> (Top "top 2") (Content ["","more text ... bla bla",""])]
>
> Notes:
>
> * I had to assume that a content line does not contain a ':', because that
> is the only way to distinguish a head-line (correct me if I'm wrong).
>
> * The key was to use optionMaybe along with try; see the definition of
> content.
>
> * I haven't tested this code on very large inputs.
>
> * I slightly changed the definition of Content to have a list of Strings,
> one for each line. I'm sure this could be altered if you wanted to retain
> all whitespace.
>
> * I am still new to Parsec, so don't take this as the definitive answer ;-)
>
> --
> Carlo Hamalainen
> http://carlo-hamalainen.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130305/b4288250/attachment.htm>


More information about the Haskell-Cafe mailing list