[Haskell-cafe] Conduit Best Practices for leftover data

Thu Apr 12 08:25:24 CEST 2012

Hello,
I am interested in the argument to Done, namely, leftover data. More
specifically, when implementing a conduit/sink, what should the
conduit specify for the (Maybe i) argument to Done in the following
scenarios (Please note that these scenarios only make sense if the
type of 'i' is something in Monoid):

1) The conduit outputted the last thing that it felt like outputting,
and exited willfully. There seem to be two options here - a) the
conduit/sink should greedily gather up all the remaining input in the
stream and mconcat them, or b) Return the part of the last thing that
never got represented in any part of anything outputted. Option b
seems to make the most sense here.

2) Something upstream produced Done, so the second argument to
NeedInput gets run. This is guaranteed to be run at the boundary of an
item, so should it always return Nothing? Instead, should it remember
all the input it has consumed for the current (yet-to-be-outputted)
element, so it can let Data.Conduit know that, even though the conduit
appeared to consume the past few items, it actually didn't (because it
needs more input items to make an output)? Remembering this sequence
could potentially have disastrous memory usage. On the other hand, It
could also greedily gather everything remaining in the stream.

3) The conduit/sink encountered an error mid-item. In general, is
there a commonly-accepted way to deal with this? If a conduit fails in
the middle of an item, it might not be clear where it should pick up
processing, so the conduit probably shouldn't even attempt to
continue. It would probably be good to return some notion of where it
was in the input when it failed. It could return (Done (???) (Left
errcode)) but this requires that everything downstream in the pipeline
be aware of Errcode, which is not ideal.I could use MonadError along
with PipeM, but this approach completely abandons the part of the
stream that has been processed successfully. I'd like to avoid using
Exceptions if at all possible.

It doesn't seem that a user application even has any way to access
leftover data anyway, so perhaps this discussion will only be relevant
in a future version of Conduit. At any rate, any feedback you could
give me on this issue would be greatly appreciated.

Thanks,
Myles