[Haskell-cafe] How to use Unicode strings?

Don Stewart dons at galois.com
Sun Nov 23 02:36:55 EST 2008


alexey.skladnoy:
> >
> > This upsets me. We need to get on with doing this properly. The
> > System.IO.UTF8 module is a useful interim workaround but we're not using
> > it properly most of the time.
> >
> > ... skipped ...
> >
> > The right thing to do is to make Prelude.putStrLn do the right thing. We
> > had a long discussion on how to fix the H98 IO functions to do this
> > better. We just need to get on with it, or we'll end up with too many
> > cases of people using System.IO.UTF8 inappropriately.
> >
> But this bring question what "the right thing" is? If locale is UTF8 or system
> support unicode some other way - no problem, just encode string properly.
> Problem is how to deal with untanslatable characters. Skip? Replace with
> question marks? Anything other? Probably we need to look how this is
> solved in other languages. (Or not solved)
> 
> And this problem related not only to IO. It raises whenever strings cross
> border between haskell world and outside world. Opening files with unicode
> names, execing, etc.
> 
> For example:
> Prelude> readFile "файл"
> *** Exception: D09;: openFile: does not exist (No such file or directory)
> Prelude> executeFile "echo" True ["Сейчас сломается"] Nothing
> !59G0A A;><05BAO
> 
> Althrough it's possible to work around using encodeString/decodeString from
> Codec.Binary.UTF8.String it won't work on non-UTF8 systems. It's not only
> neandertalian systems with one-byte locales, windows AFAIK uses other
> unicode encoding.

For just decoding / encoding in other locales, there are codec
libraries. Hunt around on hackage.

    http://hackage.haskell.org/cgi-bin/hackage-scripts/package/encoding
    http://hackage.haskell.org/cgi-bin/hackage-scripts/package/Encode


-- Don


More information about the Haskell-Cafe mailing list