unicode/internalization issues

Bulat Ziganshin bulat.ziganshin at gmail.com
Sun Mar 26 06:22:38 EST 2006


Hello haskell-prime,

i've planned some time ago to open unicode/internalization wiki page,
what reflects current state of the art in this area. here is the
information i have, please add/correct me if i don't know something or
wrong.

1. Char supports full Unicode range (about million of chars) instead
of just 8-bit ASCII: implemented in GHC 6.0 and Hugs 2005 (i'm not
sure about exact versions)

2. Character classification/convertion routines in Data.Char: all
Unicode chars managed properly starting from GHC 6.4 and Hugs 2005.
Author of this update, Dmitry Golubovsky, also provides it as
additional lib for ghc 6.2.2, and i think it is possible to extend his
work to work with any compiler supporting "wide" Chars.

3. Unicode support in I/O routines, i.e. ability to read/write UTF-8
encoded files and files what use other Unicode byte encodings: not
implemented in any compiler, afaik, but there are 3rd-party libs:
Streams library, New I/O library, and even CharIO module from jhc
sources

4. Support for UTF-8 encoded source files: implemented in ghc 6.5 and
jhc. afaik, ghc's support is more advanced because it uses
abovementioned routines to classify Chars, so you can use any national
characters in identifiers according to their case, and all other
symbols in operators. because ghc 6.5 supports ONLY utf-8 encoded
source files, these creates some problems when compiling files created
for previous versions of ghc (or for other compilers) and using ASCII
encoding with national (>chr 127) chars in comments and especially
string literals. GHC team asked their users for best solution of this
problem

if i don't mentioned here any issues regarding
unicode/internalization, please add this


-- 
Best regards,
 Bulat                          mailto:Bulat.Ziganshin at gmail.com



More information about the Haskell-prime mailing list