[Haskell-cafe] UTF-8 Strings and GHC

Duncan Coutts duncan.coutts at worc.ox.ac.uk
Thu May 25 07:24:35 EDT 2006


On Thu, 2006-05-25 at 15:02 +0000, Dmitry V'yal wrote: 
> Hi, all.
> I'm writing a GUI app using Haskell and Gtk2HS. All goes well besides one thing.
> I need to display some messages in russian and I can't figure out, how to handle
>  that.
> 
> Gtk uses UTF-8 internally, so i have to pass UTF-8 strings to it somehow.

No you don't pass UTF-8 strings to it, you pass Haskell Unicode Strings
(and Gtk2Hs converts that into UTF-8 as Gtk+ expects).

> But how to define them in source file? I get "lexical error in string/character
> literal" message then compiling using GHC-6.4.1.
> 
> I tried to bypass it by using koi8-r in sources and converting strings to UTF-8
> on the fly using ffi binding to iconv I found in MissingH (iirc). Works fine on
> Linux though is a little awkward, but I get wried results on Win32.
> 
> Is there any way to use unicode strings in sources?

At the moment it's not easy since GHC currently doesn't interpret source
files in any Unicode encoding (though I believe in the next version it
will expect source files to be UTF-8).

If you're desperate you can use a hack like this:

map chr [0x647,0x644,32,0x62A,0x62C,0x62F,0x646,32]

Or if you read text in from a file then you could decode that into a
String.

One thing that you certainly can't do is to read in a UTF-8 file and
then without doing any decoding display that in a GUI. That's because,
as I said, that a String in Haskell is supposed to be Unicode (in USC-4
encoding) and Gtk2Hs interprets it as such, so if you read in UTF-8 text
then you must decode that into a normal Haskell String.

Sorry it's all rather unsatisfactory. The missing pieces here are
allowing Unicode string literals and an IO library that make decoding
text files into normal Haskell Strings easy.

Duncan



More information about the Haskell-Cafe mailing list