[Haskell-cafe] Ready for testing: Unicode support for Handle I/O

Simon Marlow marlowsd at gmail.com
Wed Feb 4 08:31:20 EST 2009


Duncan Coutts wrote:
> On Tue, 2009-02-03 at 11:03 -0600, John Goerzen wrote:
> 
>> Will there also be something to handle the UTF-16 BOM marker?  I'm not
>> sure what the best API for that is, since it may or may not be present,
>> but it should be considered -- and could perhaps help autodetect encoding.
> 
> I think someone else mentioned this already, but utf16 (as opposed to
> utf16be/le) will use the BOM if its present.
> 
> I'm not quite sure what happens when you switch encoding, presumably
> it'll accept and consider a BOM at that point.

Yes; the utf16 and utf32 encodings accept a BOM (and generate a BOM in 
write mode).  This caused interesting bugs when doing re-decoding after 
switching encodings, because the BOM constitutes state in the decoder, 
which means that decoding is not necessarily repeatable unless you save the 
state (which iconv doesn't provide a way to do).

Are there other encodings that have this kind of state?  If so, then they 
might be restricted to NoBuffering at least when switching encodings.

Cheers,
	Simon


More information about the Libraries mailing list