[Haskell-cafe] conduit: Finalize field in PipeM

Fri May 4 10:40:22 CEST 2012

On Thu, May 3, 2012 at 11:11 PM, Paolo Capriotti <p.capriotti at gmail.com> wrote:
> Hi,
> I'm trying to write a function to convert a conduit to a Pipe (from
> pipes-core), and I'm having trouble understanding why the `Finalize`
> field in a `PipeM` constructor returns `r`. This makes it impossible
> to create it from the corresponding `M` constructor in pipes-core,
> since `M` includes an exception handler which is not guaranteed to run
> fully, hence may not provide a return value. `HaveOutput` presents a
> similar problem.
>
> From a cursory look at conduit's code, it doesn't look like the return
> value of `Finalize` is ever used, and it seems that the conduits that
> actually manage to define it either have `()` as return type or just
> throw an exception. The definition of `lift` for the `MonadTrans`
> instance duplicates the base monad action there, which doesn't look
> quite right to me (isn't that supposed to contain a _cleanup_
> action?).
>
> Shouldn't the constructor be changed to something like `PipeM (m (Pipe
> i o m r)) (Finalize m ())` or am I completely off base here?
>
> Thanks.
>
> BR,
> Paolo
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe

If you look at Source and Conduit, the `r` parameter is already set to
`()`, so in practice[1], the issue only applies to Sink. And since
`Sink` cannot have a `HaveOutput` constructor, the question is only
for the `PipeM` constructor of a `Sink`.

`pipe`ing together values always requires that the left value contain
an `r` of `()`, so theoretically this only applies to monadic
composition. If you look in the code, you can see the monadic bind
*does*, in fact, use this value[2]:

    PipeM mp c >>= fp = PipeM ((>>= fp) `liftM` mp) (c >>= pipeClose . fp)

Likewise, this would apply to `pipeClose`[3]. But `pipeClose` is far
less interesting than monadic bind, since it is really only used for
generating new Finalize values. So let's just focus on monadic bind.

In the code above, you can see that `fp` is of type `r1 -> Pipe i o m
r2`. As such, `c` *must* return a value of type `r1`, not `()`. An
alternate approach here would be to scrap the `Finalize` field
entirely, and redefine `PipeM` as simply:

    PipeM (m (Pipe i o m r))

The problem with this approach is that it can force unnecessary work
in some cases. Consider the case of `sourceFile`. When it comes time
to provide a new chunk of data, the code looks something like:

    foo = PipeM (readChunk handle >>= \bs -> return (HaveOutput foo
closeHandle bs)) (closeHandle bs)

If we didn't have that Finalize field, we would be forced to read an
extra chunk of data even if its not necessary.

I think the underlying point of distinction between conduit and
pipes-core here is that, is conduit, a Pipe of type `Pipe i o m r` is
*required* to provide an `r` value ultimately. If I understand
correctly, this is not the case in pipes-core. I believe this also
explains your question about the `MonadTrans` instance: `Finalize` is
not simply "clean up resources," it's "clean up resources __and__
return the required `r` value." That's why we duplicate the base monad
action: it's the only way to get a value of type `r` to return.

Michael

[1] I say in practice, because someone could in theory create some
type which is not a Source, Sink, or Conduit.
[2] https://github.com/snoyberg/conduit/blob/master/conduit/Data/Conduit/Internal.hs#L186
[3] https://github.com/snoyberg/conduit/blob/master/conduit/Data/Conduit/Internal.hs#L156