HTTP and character encodings

Ganesh Sittampalam ganesh at earth.li
Wed Sep 12 23:57:38 CEST 2012


On 12/09/2012 11:09, Christian Maeder wrote:

> My main use-case is simpleHTTP that is bound to the String instance,
> currently. There are no such short-cuts for byte-strings, are there?

That's a good point. I guess I would make simpleHTTP overloaded while I
was making breaking changes anyway.

> I'ld suggest to make a proper byte-string interface first 

What do you mean by "proper"? Unfortunately I don't really have time to
do any substantial refactoring in the near future.

Given lots of time now, I'd immediately make high-level and low-level
interfaces with encoding only handled in the high-level one.

> and then deprecate the String stuff.

Is it possible to deprecate an instance?

I could perhaps instead provide an escape hatch with a newtype like
UnsafeChar8String or something, either temporarily or permanently.

> (before calling Char8.pack, strings could be checked or filtered for
> "isAscii")

The problem is more on the download side; if it's a wide encoding like
UTF-16, even 7-bit cleanliness isn't enough to make Char8.unpack safe.
On the upload side, automatically using UTF-8 would probably be good enough.

Cheers,

Ganesh





More information about the Libraries mailing list