[Haskell-cafe] Re: Lazy IO and closing of file handles

Pete Kazmier pete-expires-20070513 at kazmier.com
Wed Mar 14 19:08:44 EDT 2007


dons at cse.unsw.edu.au (Donald Bruce Stewart) writes:

> pete-expires-20070513:
>> When using readFile to process a large number of files, I am exceeding
>> the resource limits for the maximum number of open file descriptors on
>> my system.  How can I enhance my program to deal with this situation
>> without making significant changes?
>
> Read in data strictly, and there are two obvious ways to do that:
>
>     -- Via strings:
>
>     readFileStrict f = do
>         s <- readFile f
>         length s `seq` return s
>
>     -- Via ByteStrings
>     readFileStrict  = Data.ByteString.readFile
>     readFileStrictString  = liftM Data.ByteString.unpack Data.ByteString.readFile
>
> If you're reading more than say, 100k of data, I'd use strict
> ByteStrings without hesitation. More than 10M, and I'd use lazy
> bytestrings.

Correct me if I'm wrong, but isn't this exactly what I wanted to
avoid?  Reading the entire file into memory?  In my previous email, I
was trying to state that I wanted to lazily read the file because some
of the files are quite large and there is no reason to read beyond the
small set of headers.  If I read the entire file into memory, this
design goal is no longer met.

Nevertheless, I was benchmarking with ByteStrings (both lazy and
strict), and in both cases, the ByteString versions of readFile yield
the same error regarding max open files.  Incidentally, the lazy
bytestring version of my program was by far the fastest and used the
least amount of memory, but it still crapped out regarding max open
files. 

So I'm back to square one.  Any other ideas?

Thanks,
Pete



More information about the Haskell-Cafe mailing list