IO monad and lazy evaluation

Hal Daume III hdaume@ISI.EDU
Wed, 21 May 2003 07:25:33 -0700 (PDT)


Hi Graham,

> strict functions in monad-chains".  I'm hoping the Haskell community has 
> some experience with this kind of issue to offer some more helpful advice, 
> or even tools to detect unsafe combinations.  Maybe a discussion of safe 
> programming patterns would be a useful interim step?

I don't really know what sort of advice is really helpful, but I can share
a few observations:

  When choosing between openFile/?/hClose and readFile, use the
  openFile/?/hClose combination only if you expect to open a lot of files.

  Rationale: readFile supposedly always closes the handle when you're
  done but sometimes really just puts it in a semi-closed state.  This
  means that if you're reading a lot of files, you're going to run
  out of handles.

  If you need to use openFile/?/hClose because of file handle issues,
  read the file (or what parts of it you need) strictly.  I actually have
  a function in my general library:

    readFileCloseBy :: DeepSeq a => FilePath -> (String -> a) -> IO a

  which opens the file, parses it using the supplied function, deepSeqs
  it to make it strict and then closes the handle.  By supplying id as
  the function, you get a version of readFile which is strict and
  always closes the Handle.

> Thinking some more... I'm reminded of some discussions I had a few years 
> ago about the timing of calls to Java finalizers, and problems this could 
> cause for network I/O programs because using finalizers to close network 
> sockets would lead to unexpected resource problems.  The only reliable 

This sounds very similar to the semi-closed handle issue in readFile.

> Anyway, my thoughts are leading me to the idea that the problem is a 
> disconnect (lack of formal connection or interlock) between the actions of 
> opening a file, reading its contents and closing it.  For example, one 
> could imagine a structure:
> 
>      hSafeGetContents :: Handle -> (String -> a) -> a
>      hSafeGetContents handle function =
>          function $ hUnsafeGetContents handle
> 
> Now the result string can be as lazy as you like, but I think one can 
> guarantee that the handle won't be closed until the function has used as 
> much of the content as it may need.

Alas, this is not true :).  Let function=id and you'll see the
problem.  You need to put a seq or a deepSeq in there somewhere, otherwise
just applying the function won't cause any of the file to be read.

 - Hal

> At 13:27 20/05/03 -0700, Hal Daume III wrote:
> >Yes.  This is because hGetContents (and hence readFile, etc.) use lazy
> >IO.  Just as in this case you might want hClose to force the file to be
> >read, in a case like:
> >
> > > do h <- openFile "really_large_file" ReadMode
> > >    c <- hGetContents h >>= return . head
> > >    hClose h
> > >    return c
> >
> >you probably don't want the close to read the whole file.  I'd argue that
> >that problem is not with hClose, but with hGetContents.  Really, a strict
> >version should be used in most situations.  Something like:
> >
> > > hGetContentsStrict h = do
> > >    b <- hIsEOF h
> > >    if b then return [] else do
> > >      c <- hGetChar h
> > >      r <- hGetContentsStrict h
> > >      return (c:r)
> >
> >of course, you could be smarter with buffering, etc.  Another way would be
> >to do something using seq/deepSeq.
> >
> >  - Hal
> >
> >--
> >  Hal Daume III                                   | hdaume@isi.edu
> >  "Arrest this man, he talks in maths."           | www.isi.edu/~hdaume
> >
> >On Tue, 20 May 2003, Graham Klyne wrote:
> >
> > > There seems to be a difficult-to-justify interaction between
> > > lazy evaluation and monadic I/O:
> > >
> > > [[
> > > -- file: SpikeIOMonadCloseHandle.hs
> > > -- Does hClose force completion of lazy I/O?
> > >
> > > import IO
> > >
> > > showFile fnam =
> > >      do  { fh <- openFile fnam ReadMode
> > >          ; fc <- hGetContents fh
> > >          ; hClose fh
> > >          ; putStr fc
> > >          }
> > >
> > > test = showFile "SpikeIOMonadCloseHandle.hs"
> > > ]]
> > >
> > > If I load this into Hugs and run it, the output is a single blank line.
> > >
> > > If I reverse the order of hClose and putStr, the source code is displayed.
> > >
> > > I think I can understand why this is happening, but it seems to me that 
> > there's
> > > a violation of referential transparency here:  I can't see any reasonable
> > > justification for the value of 'fc' to vary depending on whether it's 
> > actually
> > > used before or after some other I/O operation.
> > >
> > > I suppose I was expecting the call of hClose to force complete evaluation
> > > of any value that depends on the state prior to hClose.  I've no idea if
> > > there's a reasonable way to implement that.
> > >
> > > My concern is that this weakens the claim for monads that they provide
> > > a seamless integration between pure functional and stateful code;  cf.:
> > > [[
> > > We believe that, on the contrary, there are very significant differences
> > > between
> > > writing programs in C and writing in Haskell with monadic state
> > > transformers and
> > > IO:
> > > [...]
> > > - Usually, most of the program is neither stateful nor directly 
> > concerned with
> > > IO.  The monadic approach allows the graceful coexistence of a small amount
> > > of imperative code and the large purely functional part of the program
> > > [...]
> > > - The usual coroutining behaviour of lazy evaluation, in which the 
> > consumer of
> > > a data structure coroutines with its producer, extends to stateful 
> > computation
> > > as well.  As Hughes argues (Hughes 1989), the ability to separate what is
> > > computed from how much of it is computed is a powerful aid to writing 
> > modular
> > > programs
> > > ]]
> > > -- http://research.microsoft.com/Users/simonpj/Papers/state-lasc.ps.gz
> > >
> > > #g
> > >
> > >
> > > -------------------
> > > Graham Klyne
> > > <GK@NineByNine.org>
> > > PGP: 0FAA 69FF C083 000B A2E9  A131 01B9 1C7A DBCA CB5E
> > >
> > > _______________________________________________
> > > Haskell mailing list
> > > Haskell@haskell.org
> > > http://www.haskell.org/mailman/listinfo/haskell
> > >
> 
> -------------------
> Graham Klyne
> <GK@NineByNine.org>
> PGP: 0FAA 69FF C083 000B A2E9  A131 01B9 1C7A DBCA CB5E
>