[Haskell-cafe] simple parsec question

Immanuel Normann immanuel.normann at googlemail.com
Mon Mar 4 10:33:50 CET 2013


Andrey,
Thanks a lot for your effort! I have the same suspect that the lookahead in
the content parser is the problem, but I don't know how to solve it either.
At least the I learned from your code that noneOf is also a quite useful
parser in this context which I have overlooked.
Anyway, if you find a solution it would be great! In the end the task
itself doesn't look very specific, but rather general: an alternation
between strictly (the headline in my case) and loosely (the content in my
case) structured text. It shouldn't be difficult to build a parser for such
a setting.

(btw. I am aware the my test parser would (or rather should) parse only the
first section. For testing this would be sufficient.)



2013/3/4 Andrey Chudnov <achudnov at gmail.com>

>  Immanuel,
> I tried but I couldn't figure it out. Here's a gist with my attempts and
> results so far: https://gist.github.com/achudnov/f3af65f11d5162c73064There, 'test' uses my attempt at specifying the parser, 'test2' uses yours.
> Note that your attempt wouldn't parse multiple sections -- for that you
> need to use 'many section' instead of just 'section' in 'parse'
> ('parseFromFile' in the original).
> I think what's going on is the lookahead is wrong, but I'm not sure how
> exactly. I'll give it another go tomorrow if I have time.
>
> /Andrey
>
>
> On 03/03/2013 05:16 PM, Immanuel Normann wrote:
>
>    Andrey,
>
>  Thanks for your attempt, but it doesn't seem to work. The easy part is
> the headline, but the content makes trouble.
>
> Let me write the code a bit more explicit, so you can copy and paste it:
>
> ------------------------------------------
> {-# LANGUAGE FlexibleContexts #-}
>
> module Main where
>
> import Text.Parsec
>
> data Top = Top String deriving (Show)
> data Content = Content String deriving (Show)
> data Section = Section Top Content deriving (Show)
>
> headline :: Stream s m Char => ParsecT s u m Top
> headline = manyTill anyChar (char ':' >> newline) >>= return . Top
>
> content :: Stream s m Char => ParsecT s u m Content
> content = manyTill anyChar (try headline) >>= return . Content
>
> section :: Stream s m Char => ParsecT s u m Section
> section = do {h <- headline; c <- content; return (Section h c)}
> ------------------------------------------
>
>
>  Assume the following example text is stored in  "/tmp/test.txt":
> ---------------------------
> top 1:
>
> some text ... bla
>
> top 2:
>
> more text ... bla bla
> ---------------------------
>
>  Now I run the section parser in ghci against the above mentioned example
> text stored in "/tmp/test.txt":
>
> *Main> parseFromFile section "/tmp/test.txt"
> Right (Section (Top "top 1") (Content ""))
>
>  I don't understand the behaviour of the content parser here. Why does it
> return ""? Or perhaps more generally, I don't understand the manyTill
> combinator (though I read the docs).
>
>  Side remark: of cause for this little task it is probably to much effort
> to use parsec. However, my content in fact has an internal structure which
> I would like to parse further, but I deliberately abstracted from these
> internals as they don't effect my above stated problem.
>
>  Immanuel
>
>
> 2013/3/3 Andrey Chudnov <achudnov at gmail.com>
>
>> Immanuel,
>> Since a heading always starts with a new line (and ends with a colon
>> followed by a carriage return or just a colon?), I think it might be useful
>> to first separate the input into lines and then classify them depending on
>> whether it's a heading or not and reassemble them into the value you need.
>> You don't even need parsec for that.
>>
>> However, if you really want to use parsec, you can write something like
>> (warning, not tested):
>> many $ liftM2 Section headline content
>>    where headline = anyChar `manyTill` (char ':' >> spaces >> newline)
>>                content  = anyChar `manyTill` (try $ newline >> headline)
>>
>> /Andrey
>>
>>
>> On 3/3/2013 10:44 AM, Immanuel Normann wrote:
>>
>>> I am trying to parse a semi structured text with parsec that basically
>>> should identify sections. Each section starts with a headline and has an
>>> unstructured content - that's all. For instance, consider the following
>>> example text (inside the dashed lines):
>>>
>>> ---------------------------
>>>
>>> top 1:
>>>
>>> some text ... bla
>>>
>>> top 2:
>>>
>>> more text ... bla bla
>>>
>>>
>>> ---------------------------
>>>
>>> This should be parsed into a structure like this:
>>>
>>> [Section (Top 1) (Content "some text ... bla"), Section (Top 1) (Content
>>> "more text ... bla")]
>>>
>>> Say, I have a parser "headline", but the content after a headline could
>>> be anything that is different from what "headline" parses.
>>> How could the "section" parser making use of "headline" look like?
>>> My idea would be to use the "manyTill" combinator, but I don"t find an
>>> easy solution.
>>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130304/bd9bc59a/attachment.htm>


More information about the Haskell-Cafe mailing list