handling of multibyte char strings

Simon Marlow simonmar@microsoft.com
Thu, 18 Apr 2002 13:11:03 +0100


> GHC 5.02.[23] seems to be quite strict about literal strings
> in source code.  I presume this is a feature. :-)  However
> this is annoying for people who want to use wide chars in
> strings in source code.  (Though I realise that doing so is
> not very "portable".)
>=20
> To illustrate, the following program
>=20
> main :: IO ()
> main =3D
>     let ja =3D 'Japanese character' in putChar ja >> putChar '\n'
>=20
> results in
>=20
> Compiling Main             ( test-ja.hs, interpreted )
> test-ja.hs:3: error in character literal
> Failed, modules loaded: none.
>=20
> with both ghc and ghci.

As other folk noted, you should be fine with ISO8859-1 characters but we
don't have support for Unicode yet.  There has been discussion at
various points in the past on how to do this, and Marcin Kowalczyk has
some ideas on how the Haskell interface to multi-coded text I/O should
look.

It is possible that we could just support UTF-8 in source files without
going the whole hog and providing Unicode I/O, however.  As usual,
contributions are welcome :-)

Cheers,
	Simon