Unicode support

Ashley Yakeley ashley@semantic.org
Tue, 9 Oct 2001 03:27:31 -0700


At 2001-10-09 02:58, Kent Karlsson wrote:

>In summary:
>
>    code position (=code point): a value between 0000 and 10FFFF.

Would this be a reasonable basis for Haskell's 'Char' type? At some point 
perhaps there should be a 'Unicode' standard library for Haskell. For 
instance:

encodeUTF8 :: String -> [Word8];
decodeUTF8 :: [Word8] -> Maybe String;
encodeUTF16 :: String -> [Word16];
decodeUTF16 :: [Word16] -> Maybe String;

data GeneralCategory = Letter_Uppercase | Letter_Lowercase | ...
getGeneralCategory :: Char -> Maybe GeneralCategory;

...sorting & searching...

...canonicalisation...

etc. Lots of work for someone.

-- 
Ashley Yakeley, Seattle WA