question about lazy io

David Roundy droundy@jdj5.mit.edu
Sat, 12 Oct 2002 07:59:27 -0400


Hello.  I have a problem I ran into when I tried writing a function to use
lazy IO .  Specifically, I would like to be able to compare two directory
trees, and I thought the best way to do this would be to compartmentalize
things by writing an IO function to read in a whole directory tree
(including the contents of its files) in a lazy manner, and then the code
to manipulate this tree could be purely functional and non-monadic.

The problem is that while reading a file in with readFile (or openFile
followed by hGetContents) is lazy regarding the reading of the actual file
data, it leaves the file handle open, and (at least on ghc under linux)
quickly exhausts the available file handles.  I guess this makes sense,
since if the file wasn't opened until it was actually used, a file open
error couldn't flag an exception at the appropriate time...

I think this is a related question to the thread on config files and pure
file reading.  Certainly if there was a pure functional file read
(i.e. readFileOnce), that would solve my problem.  I hate to have to write
monadic IO code right into the core of my code.

The Haskell 98 Library Report describes the state of a file handle after
hGetContents as 'semi-closed', meaning it will close as soon as it is
finished with the current read.  Is there any way to get the file handle
into a 'semi-opened' state, in which it will be opened as soon as it starts
getting read? That's what I'd really like.

fwiw, here's the code I'd like to use to slurp up a directory and its
contents: 

>slurp dirname = do
>    isdir <- doesDirectoryExist dirname
>    if isdir
>       then do
>            former_dir <- getCurrentDirectory
>            fnames <- getDirectoryContents dirname
>            setCurrentDirectory dirname
>            sl <- slurp_list $ skip_hidden fnames
>            setCurrentDirectory former_dir
>            return $ SlurpDir dirname sl
>       else do
>            isfile <- doesFileExist dirname
>            if isfile
>              then do  
>                   contents <- readFile dirname
>                   return $ SlurpFile dirname (lines contents)
>              else return $ SlurpFile "Oops" [] 
>
>slurp_list :: [FilePath] -> IO [Slurpy]
>slurp_list [] = return []
>slurp_list (f:fs) = do
>    s <- slurp f
>    ss <- slurp_list fs
>    return (s:ss)
>
>skip_hidden :: [FilePath] -> [FilePath]
>skip_hidden [] = []
>skip_hidden (('.':_):fps) = skip_hidden fps
>skip_hidden (fp:fps) = fp : skip_hidden fps

To restate my question (which I may have forgotten to ask...), is there a
good way to do this? The only thing that comes to mind is to use unsafeIO,
which sounds scary. (Mostly because I don't think I would really understand
the consequences.)
-- 
David Roundy
http://civet.berkeley.edu/droundy/