Data.ByteString candidate 3

Wed Apr 26 03:50:05 EDT 2006

On Wed, 2006-04-26 at 02:16 +0300, Einar Karttunen wrote:

> This is very useful for many purposes and does not mean that there
> should not be a fancy UTF8 module. Rather than arguing about killing
> this, wouldn't it be more productive to create the UTF8 module?

I've been following this thread with some frowning. I can see that some
people want to dish out text over the network *really fast* and thus
would like the ability to emit pure ASCII without the overhead of 4
bytes per character. Still, I don't see the need for a .Latin1 module
next to a .Word8 module.

When it comes to UTF8, I cringe. Dealing with UTF8 is such a nightmare
to get right and it won't show up until you're test some Chinese texts
with it (or are there other common 4-byte characters?). Hence, UTF8
should not be a common interface for application developers. Haskell has
the advantage that changing Char form 8 bits to 32 bits doesn't add to
the space consumption of lists. With packed string the situation is
different, but still, I propose to

- have a library that deals with packed strings of 32-bit Haskell Char
- have a library that deals with packed Word8 sequences

This way, it will hurt if you touch the bare-metal Word8 representation,
but then, using Word8 sequences is quite an optimisation that you don't
use when you start developing an application. A simplistic solution like
this avoids the whole discussion on whether there should be an Ord or
toUpper for Latin1, or how to coerce a packed Latin1 string to a packed
Word8 representation.

Axel.