[Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

Deborah Goldsmith dgoldsmith at mac.com
Wed Sep 26 21:49:59 EDT 2007


On Sep 26, 2007, at 11:06 AM, Aaron Denney wrote:
>> UTF-16 has no advantage over UTF-8 in this respect, because of  
>> surrogate
>> pairs and combining characters.
>
> Good point.

Well, not so much. As Duncan mentioned, it's a matter of what the most  
common case is. UTF-16 is effectively fixed-width for the majority of  
text in the majority of languages. Combining sequences and surrogate  
pairs are relatively infrequent.

Speaking as someone who has done a lot of Unicode implementation, I  
would say UTF-16 represents the best time/space tradeoff for an  
internal representation. As I mentioned, it's what's used in Windows,  
Mac OS X, ICU, and Java.

Deborah



More information about the Haskell-Cafe mailing list