[Haskell-cafe] Re: Writing binary files?

Glynn Clements glynn.clements at virgin.net
Thu Sep 16 06:52:53 EDT 2004


Simon Marlow wrote:

> >>> Which is why I'm suggesting changing Char to be a byte, so that we
> >>> can have the basic, robust API now and wait for the more advanced
> >>> API, rather than having to wait for a usable API while people sort
> >>> out all of the issues.
> >> 
> >> An easier way is just to declare that the existing API assumes a
> >> Latin-1 encoding consistently.  Later we might add a way to let the
> >> application pick another encoding, or request that the I/O library
> >> uses the locale encoding.
> > 
> > But how do you do that without breaking stuff? If the application
> > changes the encoding to UTF-8 (either explicitly, or by using the
> > locale's encoding when it happens to be UTF-8), then code such as:
> > 
> > 	[filename] <- getArgs
> > 	openFile filename ReadMode
> > 
> > will fail if filename isn't a valid UTF-8 sequence. Similarly for the
> > other cases where the OS accepts/returns byte strings but the Haskell
> > interface uses String.
> 
> And that's the correct behaviour, isn't it?

No. The correct behaviour is to keep such data as byte strings. 
Otherwise it's going to be hard to write robust programs if the
hard-wired ISO-8859-1 encoding is ever changed.

In the current implementation, getArgs gets a list of bytes from
argv[], which it converts to a String. The String is passed to
openFile, which converts it back to a list of bytes which are then
passed to open().

Thus the list of bytes is effectively fed through (encode . decode). 
For ISO-8859-*, this is the identity function. For UTF-8, it's a
subfunction of the identity function, i.e. it either returns its input
or it fails. I don't see what is to be gained by having it fail. It
would be preferable to just pass the byte string directly from argv[]
to open().

> > I'm less concerned about the handling of streams, as you can
> > reasonably add a way to change the encoding before any data has been
> > read or written. I'm more concerned about FilePaths, argv, the
> > environment etc.
> 
> Yes, these are interesting issues.  Filenames are stored as character
> strings on some OSs (eg. Windows) and byte strings on others.  So the
> Haskell portable API should probably use String, and do decoding based
> on the locale (if the programmer asks for it).
> 
> Argv and the environment - I don't know.  Windows CreateProcess() allows
> these to be UTF-16 strings, but I don't know what encoding/decoding
> happens between CreateProcess() and what the target process sees in its
> argv[] (can't be bothered to dig through MSDN right now).  I suspect
> these should be Strings in Haskell too, with appropriate
> decoding/encoding happening under the hood.

I suspect that Windows will convert them according to the active
codepage, so that OpenFileA(argv[i], ...) works. 

-- 
Glynn Clements <glynn.clements at virgin.net>


More information about the Haskell-Cafe mailing list