Unicode source files

Bulat Ziganshin bulatz at HotPOP.com
Tue May 17 12:20:41 EDT 2005


Hello Simon,

Tuesday, May 17, 2005, 5:30:06 PM, you wrote:

>>> The question is what Alex should see for a unicode character: Alex
>>> currently assumes that characters are in the range 0-255 (you need a
>>> fixed range in order to generate the lexer tables).  One possibility
>>> is to map all Unicode upper-case characters to a single character
>>> code for Alex, and similarly for the other classes of character.
>> 
>> i don't know anything about Alex intrinsics, and can only say that any
>> solution is better to do INSIDE Alex, so other programs using it will
>> also get Unicode support

SM> The right thing to do as far as Alex is concerned is to collapse the
SM> full Char range onto a smaller number of character classes which are
SM> then lexed using the standard DFA lexer.  Alex could figure out the
SM> required character classes automatically.

SM> However, a simpler solution for GHC would be to essentially do this by
SM> hand, since we already know what the character classes for Haskell are
SM> (upper case, lower case, digit etc.), and we already have some code that
SM> determines character classes for Unicode characters (GHC.Unicode).  So
SM> for example you map upper-case unicode character onto 0xfe, lower-case
SM> onto 0xfd, and so on.

imho this can be made inside Alex as universal solution for all
programs - divide all >127 chars to just several classes: upper,
lower, other letters, spaces, special chars and map them to 0xfe, 0xfd
and so on as you suggests. it will work for a large number of programs
which not pay special attention to separate >127 chars

>> btw, Ruby supports writing numbers in form 1_200_000. how about adding
>> this feature to GHC? ;)

SM> I'm not keen on that.  We don't tend to introduce features that break
SM> Haskell 98 compatibility unless they're quite compelling

i know that such things are not debatable :)  as written in one
book, "there is no sence to decide some problem, if it is known
that this problem have decision" :)


-- 
Best regards,
 Bulat                            mailto:bulatz at HotPOP.com





More information about the Glasgow-haskell-users mailing list