[Haskell-cafe] Re: File path programme

Aaron Denney wnoise at ofb.net
Sun Jan 30 17:03:34 EST 2005


On 2005-01-30, Marcin 'Qrczak' Kowalczyk <qrczak at knm.org.pl> wrote:
> Glynn Clements <glynn at gclements.plus.com> writes:
>
>> And it isn't a theoretical issue. E.g. in an environment where EUC-JP
>> is used, filenames may begin with <ESC>$)B (designate JISX0208 to G1),
>> or they may not (because G1 is assumed to contain JISX0208 initally).
>
> I think such encodings are never used as default encodings of a Unix
> locale.
>
>>> The various UTF encodings do not have this particular problem; if a UTF 
>>> string is valid, then it is a unique representation of a unicode string.
>
> BOM is a problem. Unfortunately Unicode mandates that FEFF at the
> start of a UTF-8 text stream is a mark which doesn't belong to the
> text.

Right

> It provides variants of UTF-16/32 with and without a BOM, but
> UTF-8 only has the variant with a BOM. This makes UTF-8 a stateful
> encoding.

I think you mean "UTF-8 only has the variant without a BOM".  Otherwise
I'd like to see a citation in the standard for this.  Because that's
not the reading I get from <http://www.unicode.org/faq/utf_bom.html>.
Instead, it seems that whether the BOM is included or not is a function
of the protocol, and that the UTF-8 streams themselves do not include
the BOM.

-- 
Aaron Denney
-><-



More information about the Haskell-Cafe mailing list