[Haskell-cafe] bytestring vs. uvector

Sun Mar 8 01:45:03 EST 2009

On Sat, Mar 7, 2009 at 10:23 PM, Alexander Dunlap <
alexander.dunlap at gmail.com> wrote:

> Hi all,
>
> For a while now, we have had Data.ByteString[.Lazy][.Char8] for our
> fast strings. Now we also have Data.Text, which does the same for
> Unicode. These seem to be the standard for dealing with lists of bytes
> and characters.
>
> Now we also have the storablevector, uvector, and vector packages.
> These seem to be also useful for unpacked data, *including* Char and
> Word8 values.
>
> What is the difference between bytestring and these new "fast array"
> libraries? Are the latter just generalizations of the former?

There are quite a few overlaps and differences among them.

bytestring is mature and useful for low-level byte buffer manipulations, and
also for efficient I/O. This is in part because it uses pinned pointers that
can interoperate easily with foreign code. It used to have an early fusion
rewriting framework, but that was abandoned. So it will not fuse multiple
ByteString traversals into single loops. This library is widely used, and
also somewhat abused for text I/O.

storablevector is not mature (I'm not even sure if it's actually used) and
is a derivative of an old version of the bytestring library, and so has
similar characteristics for interacting with foreign code. It contains some
old fusion code that is sketchy in nature and somewhat likely to be broken.
I'm not sure I would recommend using this library.

uvector is, if my memory serves me correctly, a fork of the vector library.
It uses modern stream fusion, but is under active development and is a
little scary. I'm a little unclear on the exact difference between uvector
and vector. Both use arrays that are not pinned, so they can't be readily
used with foreign code. If you want to use either library, understand that
you're embarking on a bracing adventure.

text is not mature, and is based on the same modern fusion framework as
uvector and vector. It uses unpinned arrays, but provides functions for
dealing with foreign code. It uses a denser encoding than uvector for text,
and provides text-oriented functions like splitting on word and line
boundaries. Although it's intended for use with Unicode text, it does not
yet provide proper Unicode-aware functions for things like case conversion.
It interacts with bytestring to perform conversion to and from standard
representations like UTF-8, and (via the text-icu package) ICU for others
(SJIS, KOI-8, etc). If you want to use this library, understand that you're
embarking on a bracing adventure.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20090307/ddc1e677/attachment.htm