[Haskell-beginners] hGetContents, unicode and linux

Michael Snoyman michael at snoyman.com
Sun Nov 28 02:19:58 EST 2010


On Sun, Nov 28, 2010 at 8:53 AM, Yitzchak Gale <gale at sefer.org> wrote:
> Michael Snoyman wrote:
>> Perhaps a silly question, but are you certain that the input file is
>> valid UTF-8?
>
> That is a very good point.
>
>> You could also try using the readFile from utf8-string...
>> [or] read the contents as a lazy
>> bytestring and then use the decode functions...
>
> Those approaches are now both deprecated. Either do
> what you are doing, which gives you conceptually simple
> strings as lists of Char. Or, for better efficiency, use
> the text package:
>
>>    import qualified Data.Text.Lazy as T
>>    main :: IO ()
>>    main
>>     = do   text <- T.readFile "unicode.txt"
>>            T.putStr text
>
> In any case, you still need to have the correct encoding
> set on the handles as before. (And the input needs to
> be valid for your selected encoding.)

Which is why I would actually recommend sticking with the
bytestring/text combination when you know what the file encoding will
be and it is not system-dependent. It's the approach that I use with
Hamlet et al for precisely that reason.

Michael


More information about the Beginners mailing list