[Haskell-cafe] A Monad for on-demand file generation?

Mon Jun 30 06:04:54 EDT 2008

Hi,

for an application such as a image gallery generator, that works on a
bunch of input files (that are assumed to be constant during one run of
the program) and generates or updates a bunch of output files, I often
had the problem of manually tracking what input files a certain output
file depends on, to check the timestamps if it is necessary to re-create
the file.

I thought a while how to do this with a monad that does the bookkeeping
for me. Assuming it’s called ODIO (On demand IO), I’d like a piece of
code like this:

do file1 <- readFileOD "someInput"
   file2 <- readFileOD "someOtherInput"
   writeFileOD "someOutput" (someComplexFunction file1 file2)

only actually read "someInput" and "someOtherInput", do the calculation
and write the output if these have newer time stamps than the output.

The problem I stumbled over was that considering the type of >>=
 (>>=): Monad m => m a -> (a -> m b) -> m b
means that I can not „look ahead“ what files would be written without
actually reading the requested file. Of course this is not always
possible, although I expect this code to be the exception:

do file1 <- readFileOD "someInput"
   file2 <- readFileOD "someOtherInput"
   let filename = decideFileNamenameBasedOn file2
   writeFileOD filename (someComplexFunction file1 file2)

But assuming that the input does not change during one run of the
program, it should be safe to use "unsafeInterleaveIO" to only open and
read the input when used. Then, the readFileOD could put the timestamp
of the read file in a Monad-local state and the writeFileOD could, if
the output is newer then all inputs listed in the state, skip the
writing and thus the unsafeInterleaveIO’ed file reads are skipped as
well, if they were not required for deciding the flow of the program.

One nice thing is that the implementation of (>>) knows that files read
in the first action will not affect files written in the second, so in
contrast to MonadState, we can forget about them, which I hope leads to
quite good guesses as to what files are relevant for a certain
writeFileOD operation. Also, a function
  cacheResultOD :: (Read a, Show a) =>  FilePath -> a -> ODIO a
can be used to write an (expensive) intermediate result, such as the
extracted exif information from a file, to disk, so that it can be used
without actually re-reading the large image file.

Is that a sane idea?

I’m also considering to use this example for a talk about monads at the
GPN¹ next weekend.

Greetings,
Joachim

¹ http://entropia.de/wiki/GPN7

-- 
Joachim "nomeata" Breitner
  mail: mail at joachim-breitner.de | ICQ# 74513189 | GPG-Key: 4743206C
  JID: nomeata at joachim-breitner.de | http://www.joachim-breitner.de/
  Debian Developer: nomeata at debian.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Dies ist ein digital signierter Nachrichtenteil
Url : http://www.haskell.org/pipermail/haskell-cafe/attachments/20080630/54c367dc/attachment.bin