[Haskell-cafe] IO Put confusion

Thu Sep 16 20:30:23 EDT 2010

On Wed, Sep 15, 2010 at 12:45 AM, Chad Scherrer <chad.scherrer at gmail.com> wrote:
> Hello,
>
> I need to be able to use strict bytestrings to efficiently build a
> lazy bytestring, so I'm using putByteString in Data.Binary. But I also
> need random numbers, so I'm using mwc-random. I end up in the "IO Put"
> monad, and it's giving me some issues.
>
> To build a random document, I need a random length, and a collection
> of random words. So I have
> docLength :: IO Int
> word :: IO Put
>
> Oh, also
> putSpace :: Put
>
> My first attempt:
> doc :: IO Put
> doc = docLength >>= go
>  where
>  go 1 = word
>  go n = word >> return putSpace >> go (n-1)

I think you misunderstand, here, what return does, or possibly >>.
This function generates docLength random words, but discards all of
them except for the last one. That's what the >> operator does: run
the IO involved in the left action, but discard the result before
running the right action.

The IO action 'return x' doesn't do any IO, so 'return x >> a' does
nothing, discards x, and then does a, i.e.

return x >> a = a

> Unfortunately, with this approach, you end up with a one-word
> document. I think this makes sense because of the monad laws, but I
> haven't checked it.

Yes, the above equation is required to hold for any monad (it is a
consequence of the law that 'return x >>= f = f x')

>
> Second attempt:
> doc :: IO Put
> doc = docLength >>= go
>  where
>  go 1 = word
>  go n = do
>    w <- word
>    ws <- go (n-1)
>    return (w >> putSpace >> ws)
>
> This one actually works, but it holds onto everything in memory
> instead of outputting as it goes. If docLength tends to be large, this
> leads to big problems.

Here you're using the >> from the Put monad, which appends lazy
ByteStrings rather than sequencing IO actions. The problem is that the
ordering of IO is strict, which means that 'doc' must generate all the
random words before it returns, i.e. it must be completely done before
L.writeFile gets a look-in.

It turns out the problem you're trying to solve isn't actually simple
at all. Some of the best approaches to efficient incremental IO are
quite involved - e.g. Iteratees. But your case could be made a great
deal easier if you used a pure PRNG instead of one requiring IO. If
you could make word a pure function, something like word :: StdGen ->
(StdGen, Put) (which is more or less the same as word :: State StdGen
Put), then you'd be able to use it lazily and safely.