[Haskell-cafe] What is the state if Unicode in Haskell implementations?

Duncan Coutts duncan.coutts at worc.ox.ac.uk
Mon Jul 31 08:07:56 EDT 2006


On Mon, 2006-07-31 at 13:56 +0200, Olof Bjarnason wrote:
> Hi there!

> I'm trying to user Haskell as a code-generating language, specifically
> generating C# code files. The wish list is

> 1) reading UTF-8 coded text files into unicode-enabled Strings, lets
> call them UString 

The ordinary Haskell String type is "unicode-enabled".

> 2) writing UStrings to UTF-8 coded text files
> 3) using unicode strings in-code, that is in my .hs files
> 
> I can live without 3), and with a little good will also 2), but 1) is
> harder since I cannot really hope my input files (meta-data-files) are
> coded in anything else than UTF-8. 

You can do 1 and 2 now with a little extra code for decoding and
encoding UTF8. You will be able to do 3) in GHC 6.6.

For 1 & 2, grab some UTF8 code from somewhere:

encode, decode :: String -> String

and define

readFileUTF8 fname = fmap decode (readFile fname)
writeFileUTF8 fname content = writeFile fname (encode content)

So all internal processing happens as String which is Unicode and you
encode and decode when you read/write UTF8 encoded files.

Duncan



More information about the Haskell-Cafe mailing list