runtime fusion for Data.ByteString.cons ?

Duncan Coutts duncan.coutts at worc.ox.ac.uk
Sun Nov 19 19:12:47 EST 2006


On Sun, 2006-11-19 at 17:54 +0000, Claus Reinke wrote:
> I noticed that ByteString is drastically slower than String if I use
> cons a lot. according to the source, that is expected because of
> the memcpy for the second parameter.
> 
> but it seems to me that construction should be able to play the 
> dual trick to deconstruction (which does not copy the tail, but
> returns an indirection into the original list).

Another approach which I have considered is to do it directly by just
poking into an array but then do cunning things to make it persistent at
yet still O(1) in the best case of single-threaded construction.

Here the representation:

data StringBuilder =
  StringBuilder (ForeignPtr Word8) Int Int (IORef Int)
  -- pointer, offset, length and 'current length'

So just like the ByteString representation but with an extra IORef Int.

The idea is that the IORef tells us the current length of the used part
of the memory block. So by comparing the length at the time this
StringBuilder value was made with the real current length then we can
see if we're using the 'latest' version of the StringBuilder or if it's
been appended/prepended to since.

If we're using the latest value then we can reserve some space by
atomically incrementing the IORef and then directly write into the free
space.

If we're not starting from the latest value then we incur a O(n) penalty
to copy the array. Of course in a sequence of cons/snoc operations to an
old value the copying only happens once since now we have a new unshared
array.

To make this scheme efficient the locking has to be cheap or preferably
someone could figure out a lockless version.

This could usefully be combined with lazy bytestrings (implemented
either as lists or unbalanced trees) to provide time and space efficient
O(1) cons and snoc.

Duncan



More information about the Glasgow-haskell-users mailing list