[Haskell-cafe] How to input Unicode string in Haskell program?

Alexander V Vershilov alexander.vershilov at gmail.com
Thu Feb 21 12:07:18 CET 2013


The problem is that Prelude.getLine uses current locale to load characters:
for example if you have utf8 locale, then everything works out of the box:

> $ runhaskell 1.hs
> résumé 履歴書 резюме
> résumé 履歴書 резюме

But if you change locale you'll have error:

> LANG="C" runhaskell 1.hs
> résumé 履歴書 резюме
> 1.hs: <stdin>: hGetLine: invalid argument (invalid byte sequence)

To force haskell use UTF8 you can load string as byte sequence and convert
it to UTF-8
charecters for example by

import qualified Data.ByteString as S
import qualified Data.Text.Encoding as T

main = do
    x <- fmap T.decodeUtf8 S.getLine

now code will work even with different locale, and you'll load UTF8 from
shell
 independenty of user input's there

--
Alexander


On 21 February 2013 13:58, Semyon Kholodnov <joker.vd at gmail.com> wrote:

> Imagine we have this simple program:
>
> module Main(main) where
>
> main = do
>     x <- getLine
>     putStrLn x
>
> Now I want to run it somehow, enter "résumé 履歴書 резюме" and see this
> string printed back as "résumé 履歴書 резюме". Now, the first problem is
> that my computer runs Windows, which means that I can't use ghci
> ":main" or result of "ghc main.hs" to enter such an outrageous string
> — Windows console is locked to one specific local code page, and no
> codepage contains Latin-1, Cyrillic and Kanji symbols at the same
> time.
>
> But there is also WinGHCi. So I do ":main", copy-paste this string
> into the window (It works! Because Windows has Unicode for 20 years
> now), but the output is all messed up. In a rather curious way,
> actually: the input string is converted to UTF-8 byte string, and its
> bytes are treated as being characters from my local code page.
>
> So, it appears that I have no way to enter Unicode strings into my
> Haskell programs by hands, I should read them from files. That's sad,
> and I refuse to think I am the first one with such a problem, so I
> assume there is a solution/workaround. Now would someone please tell
> me this solution? Except from "Just stick to 127 letters of ASCII", of
> course.
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>



-- 
Alexander
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130221/d2616fd5/attachment.htm>


More information about the Haskell-Cafe mailing list