[Haskell-cafe] Iteratees again (Was: How on Earth Do You Reason about Space?)

Aleksandar Dimitrov aleks.dimitrov at googlemail.com
Thu Jun 2 14:59:04 CEST 2011


Hi Ketil,

> By the way, what is the advantage of using iteratees here?  For my
> testing, I just used:

My initial move to iteratees was more a clutch call I made when I was still
using bytestring-trie, and was having immense memory consumption problems.

bytestring-trie uses strict byte strings as an index, and since I was getting
only lazy byte strings, the only way to make them strict would be to use
(S.concat .  L.toChunks) (L and S being the lazy/strict byte string imports,)
which felt *wrong*.

In short, I thought iteratee would give me enough magic fairy dust to actually
have a decent control over how much data I'm holding in RAM at any given point —
that was not the case, since I didn't know about the pointer mechanic of strict
ByteStrings and hence was oblivious to the bad impact that would have on garbage
collection performance.

Even so, I think I can still justify using iteratees in the current design: a)
I don't like lazy IO (conceptually,) b) I'm gonna write a left-fold somewhere
anyway, might as well use a decent infrastructure for it c) I can strictly
control the chunk size, and I'm not going to have any bad effects with
accidental eager evaluation somewhere down the pipe.

c) being the only "legitimate" reason (though the reason for a) is c) ) —
adjusting the chunk size might actually yield noticeable performance differences
when reading through files that are well into the realm of gigabytes. And the
chunk size "limit" will protect me from an accidental strict fold or so that
would leave me with a 4GB file in memory.

About a): Lazy IO just doesn't "feel" right for me. I want my pure computations
to actually be pure. If I put a ' on one of my functions *within* my pure code,
this might have *side effects* — now, instead of reading in only part of the
file, this will demand the *whole* file, and that is *quite* a side effect! So,
suddenly I have to worry about side effects in my pure code. Ugh. That's why I'm
going to continue using iteratees. I don't know if that's the right
justification, but it's a "hey, it works for me!" justification I can
comfortably live with.

Besides, I don't think the iteratee interface is all that opaque. I found arrows
in HXT, for example, much more difficult to deal with conceptually. (That said,
I'm still using HDBC over Takusen, because the latter's API just didn't make
sense to me.)

Regards,
Aleks
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: Digital signature
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20110602/6c473aa9/attachment.pgp>


More information about the Haskell-Cafe mailing list