[commit: base] master: Big patch to improve Unicode support in GHC. Validated on OS X and Windows, this (509f28c)
Simon Peyton-Jones
simonpj at microsoft.com
Mon May 16 12:25:47 CEST 2011
A big thank-you to Max for pushing this change through. Not just a question of hacking, but also running a discussion about the spec and establishing a consensus in a rather complex area. Thank you and well done!
Simon
| -----Original Message-----
| From: cvs-libraries-bounces at haskell.org [mailto:cvs-libraries-bounces at haskell.org] On
| Behalf Of Max Bolingbroke
| Sent: 14 May 2011 23:06
| To: cvs-libraries at haskell.org
| Subject: [commit: base] master: Big patch to improve Unicode support in GHC.
| Validated on OS X and Windows, this (509f28c)
|
| Repository : ssh://darcs.haskell.org//srv/darcs/packages/base
|
| On branch : master
|
| http://hackage.haskell.org/trac/ghc/changeset/509f28cc93b980d30aca37008cbe66c677a0d6f
| 6
|
| >---------------------------------------------------------------
|
| commit 509f28cc93b980d30aca37008cbe66c677a0d6f6
| Author: Max Bolingbroke <batterseapower at hotmail.com>
| Date: Sat May 14 22:50:46 2011 +0100
|
| Big patch to improve Unicode support in GHC. Validated on OS X and Windows, this
| patch series fixes #5061, #1414, #3309, #3308, #3307, #4006 and #4855.
|
| The major changes are:
|
| 1) Make Foreign.C.String.*CString use the locale encoding
|
| This change follows the FFI specification in Haskell 98, which
| has never actually been implemented before.
|
| The functions exported from Foreign.C.String are partially-applied
| versions of those from GHC.Foreign, which allows the user to supply
| their own TextEncoding.
|
| We also introduce foreignEncoding as the name of the text encoding
| that follows the FFI appendix in that it transliterates encoding
| errors.
|
| 2) I also changed the code so that mkTextEncoding always tries the
| native-Haskell decoders in preference to those from iconv, even on
| non-Windows. The motivation here is simply that it is better for
| compatibility if we do this, and those are the ones you get for
| the utf* and latin1* predefined TextEncodings anyway.
|
| 3) Implement surrogate-byte error handling mode for TextEncoding
|
| This implements PEP383-like behaviour so that we are able to
| roundtrip byte strings through Strings without loss of information.
|
| The withFilePath function now uses this encoding to get to/from CStrings,
| so any code that uses that will get the right PEP383 behaviour automatically.
|
| 4) Implement three other coding failure modes: ignore, throw error,
| transliterate
|
| These mimic the behaviour of the GNU Iconv extensions.
|
| Control/Exception/Base.hs | 2 +-
| Foreign/C/String.hs | 44 +++++++-
| GHC/Conc/Windows.hs | 16 +--
| GHC/Environment.hs | 36 +++++-
| GHC/Foreign.hs | 255 ++++++++++++++++++++++++++++++++++++++++
| GHC/IO.hs | 14 ++-
| GHC/IO/Encoding.hs | 78 +++++++++----
| GHC/IO/Encoding.hs-boot | 6 +
| GHC/IO/Encoding/CodePage.hs | 96 ++++++++-------
| GHC/IO/Encoding/Failure.hs | 129 ++++++++++++++++++++
| GHC/IO/Encoding/Iconv.hs | 114 ++++++------------
| GHC/IO/Encoding/Latin1.hs | 77 +++++++------
| GHC/IO/Encoding/Types.hs | 37 ++++--
| GHC/IO/Encoding/UTF16.hs | 149 ++++++++++++------------
| GHC/IO/Encoding/UTF32.hs | 146 +++++++++++++----------
| GHC/IO/Encoding/UTF8.hs | 101 +++++++++-------
| GHC/IO/FD.hs | 11 +--
| GHC/IO/Handle/Internals.hs | 42 ++++++-
| GHC/Windows.hs | 44 +++++++
| System/Environment.hs | 219 ++++++++++++++++++++++++++++-------
| System/IO.hs | 2 +-
| System/Posix/Internals.hs | 32 +++++
| System/Posix/Internals.hs-boot | 7 +
| base.cabal | 3 +
| 24 files changed, 1214 insertions(+), 446 deletions(-)
|
|
| Diff suppressed because of size. To see it, use:
|
| git show 509f28cc93b980d30aca37008cbe66c677a0d6f6
|
| _______________________________________________
| Cvs-libraries mailing list
| Cvs-libraries at haskell.org
| http://www.haskell.org/mailman/listinfo/cvs-libraries
More information about the Cvs-libraries
mailing list