[Haskell-cafe] Has character changed in GHC 6.8?

Reinier Lamers reinier.lamers at phil.uu.nl
Tue Jan 22 10:57:17 EST 2008


Ian Lynagh wrote:
> On Tue, Jan 22, 2008 at 03:16:15PM +0000, Magnus Therning wrote:
>   
>> On 1/22/08, Duncan Coutts <duncan.coutts at worc.ox.ac.uk> wrote:
>>     
>>> On Tue, 2008-01-22 at 09:29 +0000, Magnus Therning wrote:
>>>       
>>>> I vaguely remember that in GHC 6.6 code like this
>>>>
>>>>   length $ map ord "a string"
>>>>
>>>> being able able to generate a different answer than
>>>>
>>>>   length "a string"
>>>>         
>>> That seems unlikely.
>>>       
>> Unlikely yes, yet I get the following in GHCi (ghc 6.6.1, the version
>> currently in Debian Sid):
>>
>>     
>>> map ord "a"
>>>       
>> [97]
>>     
>>> map ord "ö"
>>>       
>> [195,182]
>>     
>
> In 6.6.1:
>
> Prelude Data.Char> map ord "ö"
> [195,182]
> Prelude Data.Char> length "ö"
> 2
>
> there are actually 2 bytes there, but your terminal is showing them as
> one character.
Still, that seems weird to me. A Haskell Char is a Unicode character. An 
"ö" is either one character (unicode point 0xF6) (which, in UTF-8, is 
coded as two bytes) or a combination of an "o" with an umlaut (Unicode 
point 776). But because the last character is not 776, the "ö" here 
should just be one character. I'd suspect that the two-character string 
comes from the terminal speaking UTF-8 to GHC expecting Latin-1. GHC 6.8 
expects UTF-8, so all is fine.

On my MacBook (OS X 10.4), 'ö' also immediately expands to "\303\266" 
when I type it in my terminal, even outside GHCi. That suggests that the 
terminal program doesn't handle Unicode and immediately escapes weird 
characters.

Regards,
Reinier


More information about the Haskell-Cafe mailing list