[commit: base] master: Big patch to improve Unicode support in GHC. Validated on OS X and Windows, this (509f28c)
Max Bolingbroke
batterseapower at hotmail.com
Sun May 15 00:05:42 CEST 2011
Repository : ssh://darcs.haskell.org//srv/darcs/packages/base
On branch : master
http://hackage.haskell.org/trac/ghc/changeset/509f28cc93b980d30aca37008cbe66c677a0d6f6
>---------------------------------------------------------------
commit 509f28cc93b980d30aca37008cbe66c677a0d6f6
Author: Max Bolingbroke <batterseapower at hotmail.com>
Date: Sat May 14 22:50:46 2011 +0100
Big patch to improve Unicode support in GHC. Validated on OS X and Windows, this
patch series fixes #5061, #1414, #3309, #3308, #3307, #4006 and #4855.
The major changes are:
1) Make Foreign.C.String.*CString use the locale encoding
This change follows the FFI specification in Haskell 98, which
has never actually been implemented before.
The functions exported from Foreign.C.String are partially-applied
versions of those from GHC.Foreign, which allows the user to supply
their own TextEncoding.
We also introduce foreignEncoding as the name of the text encoding
that follows the FFI appendix in that it transliterates encoding
errors.
2) I also changed the code so that mkTextEncoding always tries the
native-Haskell decoders in preference to those from iconv, even on
non-Windows. The motivation here is simply that it is better for
compatibility if we do this, and those are the ones you get for
the utf* and latin1* predefined TextEncodings anyway.
3) Implement surrogate-byte error handling mode for TextEncoding
This implements PEP383-like behaviour so that we are able to
roundtrip byte strings through Strings without loss of information.
The withFilePath function now uses this encoding to get to/from CStrings,
so any code that uses that will get the right PEP383 behaviour automatically.
4) Implement three other coding failure modes: ignore, throw error, transliterate
These mimic the behaviour of the GNU Iconv extensions.
Control/Exception/Base.hs | 2 +-
Foreign/C/String.hs | 44 +++++++-
GHC/Conc/Windows.hs | 16 +--
GHC/Environment.hs | 36 +++++-
GHC/Foreign.hs | 255 ++++++++++++++++++++++++++++++++++++++++
GHC/IO.hs | 14 ++-
GHC/IO/Encoding.hs | 78 +++++++++----
GHC/IO/Encoding.hs-boot | 6 +
GHC/IO/Encoding/CodePage.hs | 96 ++++++++-------
GHC/IO/Encoding/Failure.hs | 129 ++++++++++++++++++++
GHC/IO/Encoding/Iconv.hs | 114 ++++++------------
GHC/IO/Encoding/Latin1.hs | 77 +++++++------
GHC/IO/Encoding/Types.hs | 37 ++++--
GHC/IO/Encoding/UTF16.hs | 149 ++++++++++++------------
GHC/IO/Encoding/UTF32.hs | 146 +++++++++++++----------
GHC/IO/Encoding/UTF8.hs | 101 +++++++++-------
GHC/IO/FD.hs | 11 +--
GHC/IO/Handle/Internals.hs | 42 ++++++-
GHC/Windows.hs | 44 +++++++
System/Environment.hs | 219 ++++++++++++++++++++++++++++-------
System/IO.hs | 2 +-
System/Posix/Internals.hs | 32 +++++
System/Posix/Internals.hs-boot | 7 +
base.cabal | 3 +
24 files changed, 1214 insertions(+), 446 deletions(-)
Diff suppressed because of size. To see it, use:
git show 509f28cc93b980d30aca37008cbe66c677a0d6f6
More information about the Cvs-libraries
mailing list