[Haskell-beginners] too lazy parsing?

Kyle Murphy orclev at gmail.com
Mon Feb 4 14:00:14 CET 2013


I can't say 100% for sure, but I'd guess it's because parsec is pure, and
the file operations are using lazy bytestrings. Since no IO operations are
applied to cont until after you close the handle, nothing can be read
(since at that time the handle is closed). If you want to keep the program
structured the same I believe there are functions that can convert a lazy
bytestring into a strict one, and then you can perform the parsing on that.
Alternatively you could rewrite things to close the file handle after you
write it's contents to the output file.

The default file operations in Haskell are known to be a source of
difficulty in terms of laziness, and there has been some debate as to
whether they're poorly designed or not. I might suggest you look into some
of the alternatives, particular those based on stream fusion principles,
that allow you to kill two birds with one stone by iteratively dealing with
input thereby forcing evaluation and also improving memory usage and making
it harder to trigger space leaks. I don't have the names available at the
moment or I'd provide them, but I'm pretty sure at least one of them is
named something like enumeratee, although I believe there's at least one
other that might debatably be considered better.
On Feb 4, 2013 5:51 AM, "Kees Bleijenberg" <k.bleijenberg at lijbrandt.nl>
wrote:

> module Main where ****
>
> ** **
>
> import Text.ParserCombinators.Parsec (many,many1,string, Parser, parse)***
> *
>
> import System.IO (IOMode(..),hClose,openFile,hGetContents,hPutStrLn)****
>
>
> ****
>
> parseFile hOut fn = do****
>
>                         handle <- openFile fn ReadMode****
>
>                         cont <- hGetContents
> handle                                       ****
>
>                         print cont****
>
>                         let res = parse (many (string "blah")) "" cont****
>
>                         hClose handle                    ****
>
>                         case res of****
>
>                             (Left err) -> hPutStrLn hOut $ "Error: " ++
> (show err)****
>
>                             (Right goodRes) -> mapM_ (hPutStrLn hOut)
> goodRes                         ****
>
>                  ****
>
> main = do   ****
>
>             hOut <- openFile "outp.txt" WriteMode****
>
>             mapM (parseFile hOut) ["inp.txt"]****
>
>             hClose hOut****
>
> ** **
>
> I’am writing a program that parses a lot of files. Above is the simplest
> program I can think of that demonstrates my problem.****
>
> The program above parses inp.txt.  Inp.txt has only the word blah in it.
> The output is saved in outp.txt. This file contains the word blah after
> running the program. if I comment out the line ‘print cont’ nothing is
> saved in outp.txt.  ****
>
> If I comment out ‘print cont’ and replace many with many1 in the following
> line, it works again?****
>
> Can someone explain to me what is going  on?****
>
> ** **
>
> Kees****
>
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://www.haskell.org/mailman/listinfo/beginners
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/beginners/attachments/20130204/a4e2ce2b/attachment.htm>


More information about the Beginners mailing list