[Haskell-cafe] The question of ByteString

Fri Nov 2 17:45:00 EDT 2007

On 11/2/07, Andrew Coppin <andrewcoppin at btinternet.com> wrote:
>
> 1. Why do I have to type "ByteString" in my code? Why isn't the compiler
> automatically performing this optimisation for me? (I.e., is there some
> observable property that is changed? Currently the answer is yes: the
> ByteString interface only provides "trancated" Unicode characters. But,
> in principle, that could be changed.)

That's an interesting question; one property the compiler would have
to prove in order to replace a String with a ByteString is that the
given String will be fully evaluated strictly (e.g., not just that it
will be evaluated to WHNF, but that each of its elements will be
demanded). Currently, GHC's demand analyzer doesn't look inside lists
(it only looks inside things that have product types), but it's not
unimaginable. It would be a matter of whether the potential payoff
justifies the complexity of the imagined analysis, as always.

>
> 2. ByteString makes text strings faster. But what about other kinds of
> collections? Can't we do something similar to them that makes them go
> faster?
>
> As I understand it, ByteString is faster due to several factors. First
> of all, it's stricter. Secondly, it's an "unboxed" structure (so you
> eliminate layers of indirection and there's less GC load). Third, it's
> implemented as an array that looks like a linked list. Given how
> ubiquitous lists are in Haskell, "array that looks like a linked list"
> sounds like one seriously useful data type! Yet ByteString seems to be
> the only implementation of this concept - and only for lists on unboxed
> bytes. (Not even unboxed Word16 or anything else, *only* Word8.) If I
> understand this correctly, a ByteString is actually a linked list of
> large array chunks. (This presumably yields fastER random access than a
> plain linked list?) Also, it seems to be possible to create a new array
> which is merely a subrange of an existing one, without any copying; the
> standard array API doesn't seem to provide this, yet it sounds damn useful.
>
> These are the things I'm thinking about. Is there some deep theoretical
> reason why things are the way they are? Or is it merely that nobody has
> yet had time to make something better? ByteString solves the problem of
> text strings (and raw binary data) very nicely, it's just a pitty we
> can't apply some of that know-how more widely...
>

I don't think there's a deep theoretical reason why this doesn't
exist, but I also don't think it's necessarily *just* a matter of no
one having had time yet. As always, there are trade-offs involved, and
people try to avoid introducing *too* many special cases into the
compiler.

Cheers,
Tim

-- 
Tim Chevalier * catamorphism.org * Often in error, never in doubt
"I give fellowship advice, not relationship advice."  -- Michael Sacramento