[Haskell-cafe] ByteString missing rewrite RULES (zipWith' f a = pack . zipWith f a)

Thomas DuBuisson thomas.dubuisson at gmail.com
Tue Oct 5 14:35:06 EDT 2010


>  I don't have a horse in this race; but I am curious as to why
>  you wouldn't ask for `chunkOverhead = 16' as that seems to be
>  your intent as well as what the expression works out to on any
>  machine in common use.

To avoid copying data when perform FFI calls to common cipher routines
(such operations usually work on 128 bit blocks).

If you have a Haskell program performing full disk encryption (FDE)
then its reasonable to expect large amounts of data to need
encrypted/decrypted.  Reading in Lazy ByteStrings you get 32k -
chunkOverhead sized strict bytestrings, which is a 64 bit multiple on
32 bit machines.  IOW, for an operation like "cbc key iv lazyBS" you
will 1) encrypt 32K-16B 2) copy the remainder (8 bytes) and the next
chunk (32K - 8B) into a new strict bytestring 3) encrypt the full 32K
chunk 4) repeat.

There are other ways to do it, but the fastest ways involve making
your delicate and extremely security sensitive cipher algorithm work
on partial blocks or build the notion of linked lists of buffers (lazy
byte strings) into the implementation (which is often in C).

Unfortunately, this problem only gets worse as you expand your scope.
Hash algorithms have a much wider array of block sizes (512 to 1024
bits are very common) and we don't want to waste 1024 - 64 bits per
32KB chunk, so I didn't request that.  In situations where people know
they'll be hashing large files and explicitly use Lazy ByteStrings
they could use hGetN to set the chunk size to something agreeable.

A less programmer-intensive solution would be to have chunks at a full
32K.  I'm not sure how much of a performance problem this would
introduce (to all users of bytestrings) due to caching (other
issues?).  Did anyone measured it when the initial implementation
decided to do it this way?

Cheers,
Thomas


More information about the Haskell-Cafe mailing list