[Haskell-cafe] Text.JSON and utf8

Gregory Collins greg at gregorycollins.net
Mon Feb 11 15:14:09 CET 2013


Don't use the json package, use aeson instead. (It's much faster and
handles encoding issues correctly).

G


On Mon, Feb 11, 2013 at 2:56 PM, Martin Hilbig <lists at mhilbig.de> wrote:

> hi,
>
> tl;dr: i propose this patch to Text/JSON/String.hs and would like to
> know why it is needed:
>
> @@ -375,7 +375,7 @@
>    where
>    go s1 =
>      case s1 of
> -      (x   :xs) | x < '\x20' || x > '\x7e' -> '\\' : encControl x (go xs)
> +      (x   :xs) | x < '\x20' -> '\\' : encControl x (go xs)
>        ('"' :xs)              -> '\\' : '"'  : go xs
>        ('\\':xs)              -> '\\' : '\\' : go xs
>        (x   :xs)              -> x    : go xs
>
>
> i recently stumbled upon CouchDB telling me i'm sending invalid json.
>
> i basically read lines from a utf8 file with german umlauts and send
> them to CouchDB using Text.JSON and Database.CouchDB.
>
>   $ file lines.txt
>   lines.txt: UTF-8 Unicode text
>
> lets take 'ö' as an example. i use LANG=de_DE.utf8
>
> ghci tells
>
> > 'ö'
> '\246'
>
> > putChar '\246'
> ö
>
> > putChar 'ö'
> ö
>
> > :m + Text.JSON Database.CouchDB
> > runCouchDB' $ newNamedDoc (db "foo") (doc "bar") (showJSON $ toJSObject
> [("test","ö")])
> *** Exception: HTTP/1.1 400 Bad Request
> Server: CouchDB/1.2.1 (Erlang OTP/R15B03)
> Date: Mon, 11 Feb 2013 13:24:49 GMT
> Content-Type: text/plain; charset=utf-8
> Content-Length: 48
> Cache-Control: must-revalidate
>
> couchdb log says:
>
>   Invalid JSON: {{error,{10,"lexical error: invalid bytes in UTF8
> string.\n"}},<<"{\"test\":\"<**F6>\"}">>}
>
> this is indeed hex ö:
>
> > :m + Numeric
> > putChar $ toEnum $ fst $ head $ readHex "f6"
> ö
>
> if i apply the above patch and reinstall JSON and CouchDB the doc
> creation works:
>
> > runCouchDB' $ newNamedDoc (db "db") (doc "foo") (showJSON $ toJSObject
> [("test", "ö")])
> Right someRev
>
> but i dont get back the ö i expected:
>
> > Just (_,_,x) <-runCouchDB' $ getDoc (db "foo") (doc "bar") :: IO (Maybe
> (Doc,Rev,JSObject String))
> > let Ok y = valFromObj "test" =<< readJSON x :: Result String
> > y
> "\195\188"
> > putStrLn y
> ü
>
> apperently with curl everything works fine:
>
> $ curl localhost:5984/db/foo -XPUT -d '{"test": "ö"}'
> {"ok":true,"id":"foo","rev":"**someOtherRev"}
> $ curl localhost:5984/db/foo
> {"_id":"bars","_rev":"**someOtherRev","test":"ö"}
>
> so how can i get my precious ö back? what am i doing wrong or does
> Text.JSON need another patch?
>
> another question: why does encControl in Text/JSON/String.hs handle the
> cases x < '\x100' and x < '\x1000' even though they can never be
> reached with the old predicate in encJSString (x < '\x20')
>
> finally: is '\x7e' the right literal for the job?
>
> thanks for reading
>
> have fun
> martin
>
> ______________________________**_________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/**mailman/listinfo/haskell-cafe<http://www.haskell.org/mailman/listinfo/haskell-cafe>
>



-- 
Gregory Collins <greg at gregorycollins.net>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130211/c141a670/attachment.htm>


More information about the Haskell-Cafe mailing list