[Haskell-cafe] [ANNOUNCE] (and request for review): directory-tree v0.9.0

Jason Dagit dagit at codersbase.com
Tue Aug 10 21:14:41 EDT 2010


On Tue, Aug 10, 2010 at 5:54 PM, Brandon Simmons <
brandon.m.simmons at gmail.com> wrote:

> On Tue, Aug 10, 2010 at 4:34 PM, Jason Dagit <dagit at codersbase.com> wrote:
> >
> >
> > On Mon, Aug 9, 2010 at 10:48 PM, Brandon Simmons
> > <brandon.m.simmons at gmail.com> wrote:
> >>
> >> Greetings Haskellers!
> >>
> >> directory-tree is a module providing a directory-tree-like datatype
> >> along with Foldable and Traversable instances, along with a simple,
> >> high-level IO interface. You can see the package along with some
> >> examples here (apologies if the haddock docs haven't been generated
> >> yet) :
> >>
> >>    http://hackage.haskell.org/package/directory-tree
> >
> > If I understand what you're saying, then your library is very similar to
> an
> > abstraction that darcs had for years knows as "Slurpy".  The experience
> in
> > the darcs project was that it lead to performance issues and correctness
> > issues that were hard to find/fix.
> >>
> >> This primary change in this release is the addition of two
> >> experimental "lazy" functions: `readDirectoryWithL` and `buildL`.
> >> These functions use `unsafePerformIO` behind the scenes to traverse
> >> the filesystem as required by pure computations consuming the returned
> >> DirTree data structure. I believe I am doing this safely and sanely
> >> but would love if some more experienced folks could comment on the
> >> code.
> >
> > unsafePerformIO or unsafeInterleaveIO?
> > Either way, to me it seems a bit dangerous to be doing this sort of lazy
> IO.
> >  If the directory structure is large will I run out of file handles?  How
> > will IO errors be handled?  Will I receive the exceptions in pure code or
> > inside my IO actions?  Will I run into space leaks if something holds on
> to
> > 1 file and then references it "after" the directory traversal?  I might
> have
> > my history wrong, but as I recall darcs started with lazy slurpies and
> moved
> > to doing things strictly due to space leaks, running out of file
> > descriptors, file descriptor leaks (not running out, but having the file
> be
> > locked long after darcs should have been 'done' with it), and exception
> > delivery.
>
> IO Errors are caught in a pure constructor called "Failed". In
> practice I think my unsafe version is better in many of those respects
> than the original, for example with regard to running out of file
> handles. Are you referring to lazy IO in general, which those problems
> you mention seem to apply to, or the use of unsafePerformIO?
>

It boils down to the same thing right?


>
> I certainly want this module to be as useful and problem-free as
> possible, but I will be content if it is no less problematic than lazy
> IO is problematic.
>
> Could you elaborate on
>
>    > "Will I run into space leaks if something holds on to1 file and
> then references
>    > it "after" the directory traversal"?
>
>
Let me give you an example.  Prelude's readFile is lazy.  That is, it
returns immediately and then only fetches from the file as you demand the
contents of the file.  This makes it possible to stream the file.  If you
process it chunks, say 1 line at a time, then you can do so in constant
space.

If you then let the contents of the file escape, meaning somewhere else in
the processing references it, then you'll stop streaming it and start
holding on to the whole thing at once.  Something like this, untested:

notleaky1 = do
  xs <- readFile "foo"
  mapM_ print (lines xs)

notleaky2 = do
  xs <- readFile "foo"
  print (length xs)

leaky = do
  xs <- readFile "foo"
  mapM_ print (lines xs)
  print (length xs)

handleleak = do
  xs <- readFile "foo"
  return (take 10 xs)

Now, in leaky if you calculated the length and printed the lines in the same
iteration, the leak would go away.  In the handleleak example the file stays
open even after handleleak produces all 10 elements.

Now imagine those examples in terms of directory traversals instead of read
from a file.

This would still be a problem even if replace readFile with readFile':
readFile' f = unsafePerformIO (readFile f)

I hope that helps,
Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100810/d0c98f4f/attachment.html


More information about the Haskell-Cafe mailing list