[Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Thu Aug 19 13:21:00 EDT 2010

On Wed, Aug 18, 2010 at 23:33, Jason Dagit <dagit at codersbase.com> wrote:
> The main reason I would use iteratees is for performance reasons.  To help
> me, as a potential consumer of your library, could you please provide
> benchmarks for comparing the performance of enumerator with say, a)
> iteratee, b) lazy/strict bytestring, and c) Prelude functions?
> I'm interested in both max memory consumption and run-times.  Using
> criterion and/or progression to get the run-times would be icing on an
> already delicious cake!

Oleg has some benchmarks of his implementation at <
http://okmij.org/ftp/Haskell/Iteratee/Lazy-vs-correct.txt >, which
clock iteratees at about twice as fast as lazy IO. He also compares
them to a native "wc", but his comparison is flawed, because he's
comparing a String iteratee vs byte-based wc.

I'll benchmark my "wc" and "cat" against common alternative
implementations. My expectation it that they will be much slower than
buffers, slightly slower than strict bytestrings, and faster than lazy
bytestrings.

One of the large advantages iteratees have over lazy IO is that space
use is very predictable. While exact numbers depend on the enumerator
and iteratee, they are typically small and constant. For example,
enumFile uses a 4096-byte buffer which is copied to a ByteString[1],
so "cat" will use only about 10 KiB for a file copy. enumHandle lets
this value be tuned, depending on whether you'd like smaller space use
or fewer buffer reads.

[1] I don't know why this is done -- the reuse buffer/copy idiom is
present in Oleg's code, but I suspect just using B.hGet will be more
efficient. I'll do some benchmarks to confirm.

> ListLike is possibly nice, but in the type indexed iteratee implementation
> that I started (but could not finish due to some issues with the type
> indexing) I didn't use it.  ListLike doesn't support type threaded lists at
> all.  On a side note, in my type threaded iteratee library, I initially
> elided StreamChunk but later added something similar in because I found it
> useful.  I can't recall of the top of my head what the reasoning was, but I
> could dig deeper if it interests you.  I was also following a fairly
> faithful re-implementation of John Lato's implementation, just with type
> indexing.  I should probably post my partial library regardless.  Perhaps
> others can find ways around the bits I was stuck on.

If you can recall the reasoning behind using ListLike or StreamChunk,
it would be useful. Their advantages over simply using lists is not
obvious to me.

> I can see seeking as being important as your library moves into new domains
> of use.  Particularly when reading large binary streams when the data is
> sparse.

Though I don't have any personal experience writing Haskell parsers
for sparsely-populated files, I suspect that folds are poorly adapted
to seeking. It will probably be more efficient to implement your own
enumerator or enumeratee, which contains logic for skipping
uninteresting portions of the file.