Haskell Platform Proposal: add the 'text' library

Axel Simon Axel.Simon at in.tum.de
Wed Oct 20 15:45:44 EDT 2010


On Oct 20, 2010, at 19:44, Ian Lynagh wrote:

> Johan wrote:
>> If you process a string code point by code point you might mistakenly
>> confuse a plain "a" (A) with a "å" (A-RING *or* A + COMBINING RING).
>
> But when characters and codepoints are 1:1, you /can/ process code  
> point
> by code point.
>
> Am I missing something?

AFAIK there are scripts that have so many combinations that Unicode  
does not have a single codepoints for each character. In Arabic you  
can have one of 5 vowel signs on each of the 28 letters. But Unicode  
does not provide 5*28 codepoints for the combinations. That is  
probably the reason for have these combined characters.

Mac OS tries to take all the characters into as many codepoints as  
possible whereas Windows tries to merge them as much as possible. I  
don't think there is a good semantics for replace without knowing what  
(normal) form you're working on. Normally, search/replace and sorting  
on Unicode are specialized algorithms that cannot be reduces to simple  
substitutions or permutations.

So I suggest to just provide functions on codepoints and let the user  
struggle with the rest.

Cheers,
Axel



More information about the Libraries mailing list