[Haskell-cafe] I/O and utf8

Einar Karttunen ekarttun at cs.helsinki.fi
Wed Jan 11 10:14:44 EST 2006


On 10.01 10:25, Bulat Ziganshin wrote:
> i have the question about this issue - i also want to provide
> autodetection mechanism, which relies on first bytes of text files to
> set proper encoding. what is the standard rules to encode utf8/utf16
> encoding used for text in file in these first bytes?

The BOM is used to mark the encoding
(http://en.wikipedia.org/wiki/Byte_Order_Mark), but most
UTF-8 streams lack it. I have not seen it used in UTF-8 files either.

Do you plan on supporting things like HTTP where the character set
is only known in the middle of the parsing?

- Einar Karttunen


More information about the Haskell-Cafe mailing list