UTF-8 library

Glynn Clements glynn.clements@virgin.net
Thu, 8 Aug 2002 15:51:48 +0100


Axel Simon wrote:

> Sure it is necessary to explicitly state what kind of conversion you want
> when you talk to your C library. So IMHO the only question that really
> remains is whether readFile, withCString and all other standard I/O 
> function should assume UTF-8 by default or the current locale. I opt for 
> the latter.

I suggest ISO-8859-1, as

1. Assuming UTF-8 will result in errors when trying to read 8-bit data
which isn't actually UTF-8.

2. There's a lot more ISO-8859-1 data in existence than UTF-8. 
Actually, there are quite a lot of encodings which are more popular in
the real world than UTF-8.

3. Unicode code points 0..255 correspond to ISO-8859-1.

4. The current locale doesn't tell you anything about the actual
encoding of most of the data streams (files, network connections)
which you are likely to process.

Note that we're not discussing solutions here, but workarounds. The
only actual "solution" would be to redesign Haskell's I/O and string
handling libraries from scratch without pretending that the
octet/byte/character distinctions can be glossed over.

-- 
Glynn Clements <glynn.clements@virgin.net>