Haskell Platform Proposal: add the 'text' library

Ian Lynagh igloo at earth.li
Wed Oct 20 13:44:23 EDT 2010


On Wed, Oct 20, 2010 at 09:57:04AM -0700, Bryan O'Sullivan wrote:
> On Wed, Oct 20, 2010 at 9:52 AM, Johan Tibell <johan.tibell at gmail.com>wrote:
> 
> >
> > I think the right thing to do here is to perform normalization first but
> > I'm not sure.
> 
> 
> Hi, friendly neighbourhood Unicode expert here. Yes, in the case Ian cites,
> you want to perform normalization before doing the replacement. The
> behaviour he demonstrates is normal, expected, and consistent with the
> standard.

OK, so that works with the previous example:

Data.Text Data.Text.IO Data.Text.ICU> let t = pack "z\x0061\x030A\x0061z"
Data.Text Data.Text.IO Data.Text.ICU> t
"za\778az"
Data.Text Data.Text.IO Data.Text.ICU> putStrLn t
zåaz
Data.Text Data.Text.IO Data.Text.ICU> normalize NFC t
"z\229az"
Data.Text Data.Text.IO Data.Text.ICU> putStrLn (normalize NFC t)
zåaz
Data.Text Data.Text.IO Data.Text.ICU> putStrLn (replace (pack "a") (pack "y") (normalize NFC t))
zåyz

but only because now characters and codepoints are 1:1. If we were using
a character for which there is no code point, e.g. (the probably
non-existent, but I understand there are real examples) p-ring:

Data.Text Data.Text.IO Data.Text.ICU> let t = pack "zp\x030Apz"
Data.Text Data.Text.IO Data.Text.ICU> t
"zp\778pz"
Data.Text Data.Text.IO Data.Text.ICU> putStrLn t
zp̊pz
Data.Text Data.Text.IO Data.Text.ICU> normalize NFC t
"zp\778pz"
Data.Text Data.Text.IO Data.Text.ICU> putStrLn (normalize NFC t)
zp̊pz
Data.Text Data.Text.IO Data.Text.ICU> putStrLn (replace (pack "p") (pack "y") (normalize NFC t))
zẙyz

then it doesn't work.

Johan wrote:
> If you process a string code point by code point you might mistakenly
> confuse a plain "a" (A) with a "å" (A-RING *or* A + COMBINING RING).

But when characters and codepoints are 1:1, you /can/ process code point
by code point.

Am I missing something?


Thanks
Ian



More information about the Libraries mailing list