Text in Haskell: A PROPOSAL

Ketil Z. Malde ketil@ii.uib.no
08 Aug 2002 13:05:14 +0200


Ashley Yakeley <ashley@semantic.org> writes:

> The notion of "current locale settings" (including newline conventions) 
> bothers me.

Me too. But if we wish to support files stored in different formats
(e.g. ISO-8859-1 is the standard here) I don't see how we can avoid
it. 

> Do we really need "text mode" anymore?

I don't follow you?

>>  With, perhaps, UTF-8 as a reasonable default?

> Perhaps it should _always_ be UTF-8? Or is that too slow in some cases? 

I don't think speed is much of an issue, IO is generally slow compared
to processing, and I suspect speed critical applications might want to
use Word8 anyway.

The seek issues are...well, issues.  I'd suggest restricting seeking
to handles working on Word8, since that is where it makes sense and is
easy to implement.

I wonder if anybody are actually *using* non-octet based encodings
(e.g. UTF-16/UCS-2) in files or in sockets (without wrapping the
encoded content in a higher level protocol, like MIME)?  Even if
various standards support them, we might be better off with less
complexity and handling the *useful* cases, if it turns out the
complex cases aren't real world.

IMHO, UTF-8 is a good compromise, it will break ISO-8859 encodings,
but only here and there, as the 7-bit ASCII characters usually make up
the bulk of the data.

-kzm
-- 
If I haven't seen further, it is by standing in the footprints of giants