[Haskell-cafe] Has character changed in GHC 6.8?

Duncan Coutts duncan.coutts at worc.ox.ac.uk
Tue Jan 22 05:36:44 EST 2008


On Tue, 2008-01-22 at 09:29 +0000, Magnus Therning wrote:
> I vaguely remember that in GHC 6.6 code like this
> 
>   length $ map ord "a string"
> 
> being able able to generate a different answer than
> 
>   length "a string"

That seems unlikely.

> At the time I thought that the encoding (in my case UTF-8) was “leaking
> through”.  After switching to GHC 6.8 the behaviour seems to have
> changed, and mapping 'ord' on a string results in a list of ints
> representing the Unicode code point rather than the encoding:

Yes. GHC 6.8 treats .hs files as UTF-8 where it previously treated them
as Latin-1.

>   > map ord "åäö"
>   [229,228,246]
> 
> Is this the case, or is there something strange going on with character
> encodings?

That's what we'd expect. Note that GHCi still uses Latin-1. This will
change in GHC-6.10.

> I was hoping that this would mean that 'chr . ord' would basically be a
> no-op, but no such luck:
> 
>   > chr . ord $ 'å'
>   '\229'
> 
> What would I have to do to get an 'å' from '229'?

Easy!

Prelude> 'å' == '\229'
True
Prelude> 'å' == Char.chr 229
True

Remember, when you type:
Prelude> 'å'

what you really get is:
Prelude> putStrLn (show 'å')

So perhaps what is confusing you is the Show instance for Char which
converts Char -> String into a portable ascii representation.

Duncan



More information about the Haskell-Cafe mailing list