patch applied (cabal): First pass at parsing .cabal files as UTF8

Duncan Coutts duncan.coutts at worc.ox.ac.uk
Mon Feb 25 16:07:08 EST 2008


On Mon, 2008-02-25 at 11:53 +0000, Ross Paterson wrote:
> On Sun, Feb 24, 2008 at 05:46:35PM +0000, Duncan Coutts wrote:
> > I've added readTextFile and writeTextFile to the Utils module and
> > checked all other uses of readFile and writeFile.
> > 
> > I've also switched the rawSystemStdout to assume UTF8 output format.
> 
> The read and write functions ought to open their files in binary mode.
> It's just wrong to read Unicode characters (which is what a plain text
> Handle promises you) and treat them as bytes. There's a similar problem
> with using toUTF on stdout and stderr.  Haskell 98 is very clear that
> putChar on those Handles takes Unicode characters, though it does not
> specify how these are encoded in the environment.  GHC has historically
> assumed an ISO-8859-1 encoding, truncating larger characters, but other
> implementations could map them to the current locale (as Hugs does).
> Perhaps a future GHC will map them to UTF.  I think you should just
> hand the characters to putChar and leave their presentation to the
> implementation, flawed though GHC's currently is.

It is a mess.

It's no use pretending that readFile returns Unicode, it just doesn't
(except on Hugs which does it properly). GHC is not going to catch up on
this any time soon.

If we open the files in binary mode we don't get the cr/lf line
conversion on Windows and we'd have to do that ourselves. Perhaps that's
the way to go.

As for stdout/stderr we're just stuffed. We cannot reopen them in binary
mode and hugs and ghc have different and incompatible behaviour. We
either end up double encoding with hugs or not decoding with ghc. There
is no single method that works with both. We'd have to switch on the
system in use.

Duncan



More information about the cabal-devel mailing list