[Haskell-cafe] stream/bytestring questions

Sun Feb 17 20:01:23 EST 2008

chad.scherrer:
> ByteStrings have given a real performance boost to a lot of Haskell
> applications, and I'm curious why some of the techniques aren't more
> abstracted and widely available. If it's "because it's a big job",
> that's certainly understandable, but maybe there's something I'm
> overlooking that some of the abstractions aren't really desirable.
> 
> My first question is about the stream fusion. This seems (based on the
> ByteString paper) to speed things up quite a lot, but for some reason
> this isn't necessarily the case for lists. Is this just a matter of

yeah, with lists, as compared to bytestrings, there are:

    * more complex operations to fuse
    * allocation is much cheaper (lazy list cons nodes)
    * built in desugaring for build/foldr fusion interferes (enumerations, comprehensions)

so the benefits of fusing lists are less obvious than bytestrings, where
every fusion point knocks out a big array allocation, and they're a bit
more complex to get the full api going.

> the existing fusion for lists doing such a good job in many cases that
> it's hard to beat? The streams paper suggests that some contructors
> aren't optimized away during fusion, but wouldn't this also be a
> problem for fusion in the bytestring context? Are there many cases

probably not, as the issues are mostly to do with deeply nested
comprehensions, which simply aren't done with bytestrings.

> where it helps performance to program to streams directly, rather than

no, using the rules should be fine. you're not supposed to program in
the stream abstraction.

> letting the rules go back and forth between them and lists? I tried to
> do this, but kept getting hung up on the Unlifted stuff; it's not
> exposed (pardon the pun) in the API, and I don't understand why the
> types are different than the usual (), Either a b, (a,b), etc.
> 
> Second, the idea of representing a sequence as a lazy list of strict
> arrays is very appealing. But why represent it so concretely? Wouldn't
> it make sense to do something like
> 
> data ArrayList a i e = Empty | Chunk !(a i e) (ArrayList a i e)
> ?

someone could do that. we chose to  go with the monomorphic case, since
its easier to get the api right, and the performance right.

> Then array fusion could be written for IArray types in general, and
> the ByteString library would become a set of instances with some
> specialization pragmas. Presumably a more general StorableVector could
> be represented as an IArray, and the NDP stuff seems a good fit too,
> so this would make it easy to leverage future work. Seems to separate
> the various concerns a little better, too, but again maybe there's a
> performance issue I'm missing.
> 
> Sorry for the barrage of questions; I guess there's just a lot I'm
> trying to understand better.

Sure. Hope that clears things up.