[Haskell-cafe] Data.Text UTF-8 question

jeff p mutjida at gmail.com
Fri Aug 31 07:59:26 CEST 2012


Hello,

I have a sample file (attached) which I cannot read into Text:

    Prelude Control.Applicative> Data.Text.IO.readFile "foo"
    *** Exception: utf8.txt: hGetContents: invalid argument (invalid
byte sequence)

    Prelude Control.Applicative> Data.Text.Encoding.decodeUtf8 <$>
Data.ByteString.Char8.readFile "foo"
    "*** Exception: Cannot decode byte '\x6e':
Data.Text.Encoding.decodeUtf8: Invalid UTF-8 stream

So it seems that foo doesn't contain valid UTF-8. However,
System.IO.UTF8 has no problem reading the data:

    Prelude Control.Applicative> System.IO.UTF8.readFile "foo"
    "3591,,,dihigma99h,1905,5,25,CUBA,,Matanzas,1971,5,20,CUBA,,Cienfuegos,Martin,Dihigo,,Mart\65533n
Magdaleno Dihigo
    (Llanos),,190,74,R,R,,,,dihigma99,dihigma99,dihim001,dihigma99,dihigma99\r\n"

Shouldn't these all have the same behavior?

I am running on Mac OS X 10.8.1, with GHC 7.4.2 and text-0.11.2.3.

thanks for any insight,
  Jeff
-------------- next part --------------
A non-text attachment was scrubbed...
Name: foo
Type: application/octet-stream
Size: 182 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20120830/e3c799e6/attachment.obj>


More information about the Haskell-Cafe mailing list