[Haskell-cafe] UTF-8 problems when decoding JSON data coming from Network.HTTP

Michael Snoyman michael at snoyman.com
Sun Oct 17 08:37:41 EDT 2010

On Sun, Oct 17, 2010 at 2:26 PM, Ionut G. Stan <ionut.g.stan at gmail.com> wrote:
> On 17/Oct/10 8:02 AM, Michael Snoyman wrote:
>> In the gist you sent, the problem is that you are reading the HTTP
>> response as a String. The HTTP library doesn't deal well with
>> non-Latin characters when doing String requests; you should be using
>> ByteString and then converting. It's a little tedious using the HTTP
>> library with ByteStrings, which is one of the reasons I wrote
>> http-enumerator. Here's some working code. The main point is to
>> convert the UTF8 octets to a String.
>> You could also consider using one of the JSON libraries that support
>> bytestrings directly instead of strings, which will likely result in
>> much better performance. Contenders include JSONb[1] and
>> yajl-enumerator[2].
>> import Network.HTTP.Enumerator
>> import qualified Text.JSON as JSON
>> import qualified Data.ByteString.Lazy.UTF8 as BSLU
>> data GithubUser = GithubUser {
>>         name     :: String,
>>         location :: String
>>     } deriving (Eq, Show)
>> instance JSON.JSON GithubUser where
>>     readJSON (JSON.JSObject object) =
>>         let (Just a)          = lookupM "user" $ JSON.fromJSObject object
>>             (JSON.JSObject b) = a
>>             user              = JSON.fromJSObject b
>>         in do name<- lookupM "name"     user>>= JSON.readJSON
>>               location<- lookupM "location" user>>= JSON.readJSON
>>               return $ GithubUser {
>>                   name     = name,
>>                   location = location
>>               }
>>     showJSON user = JSON.makeObj [
>>                         ("name",     JSON.showJSON $ name user),
>>                         ("location", JSON.showJSON $ location user)
>>                     ]
>> lookupM :: (Monad m) =>  String ->  [(String, a)] ->  m a
>> lookupM x xs = maybe (fail $ "No such element: " ++ x) return (lookup x
>> xs)
>> main = do jsonLbs<- simpleHttp
>> "http://github.com/api/v2/json/user/show/igstan"
>>           let jsonText = BSLU.toString jsonLbs
>>           let result = JSON.decode jsonText :: JSON.Result GithubUser
>>           showResult result
>>        where showResult (JSON.Ok json) = putStrLn $ name json
>>              showResult (JSON.Error e) = putStrLn e
>> Michael
>> [1] http://hackage.haskell.org/package/JSONb-1.0.2
>> [2] http://hackage.haskell.org/package/yajl-enumerator
> Thanks Michael, now it works indeed. But I don't understand, is there any
> inherent problem with Haskell's built-in String? Should one choose
> ByteString when dealing with Unicode stuff? Or, is there any resource that
> describes in one place all the problems Haskell has with Unicode?

There's no problem with String; you just need to remember what it
means. A String is a list of Chars, and a Char is a unicode codepoint.
On the other hand, the HTTP protocol deals with *bytes*, not Unicode
codepoints. In order to convert between the two, you need some type of
encoding; in the case of JSON, I believe this is always specified as

The problem for you is that the HTTP package does *not* perform UTF-8
decoding of the raw bytes sent over the network. Instead, I believe it
is doing the naive byte-to-codepoint conversion, aka Latin-1 decoding.
By downloading the data as bytes (ie, a ByteString), you can then
explicitly state that you want to do UTF-8 decoding instead of

It would be entirely possible to write an HTTP library that does this
automatically, but it would be inherently limited to a single encoding
type. By dealing directly with bytestrings, you can work with any
character encoding, as well as binary data such as images which does
not have any character encoding.


