[Haskell-cafe] Re: Writing binary files?

Gabriel Ebner ge at gabrielebner.at
Thu Sep 16 08:09:28 EDT 2004


Glynn Clements <glynn.clements at virgin.net> writes:

>> > If you want text, well, tough; what comes out most system calls and
>> > core library functions (not just read()) are bytes.
>> 
>> Which need to be interpreted by the program depending on where these
>> bytes come from.
>
> They don't necessarily need to be interpreted.

I was thinking of data read from an fd.

> A lot of data simply gets "routed" from one place to another. E.g. a
> program reads a filename from argv[i] and passes it to open(). It
> doesn't matter if the filename is in Klingon.

Right.

> If you *need* an encoding, and don't have any better information, then
> the locale provides a last resort. Decoding bytes according to the
> locale for the sake of it just adds an unnecessary failure mode.

Right.

> For case testing, locale-dependent sorting and the like, you need to
> convert to characters. [Although possibly only temporarily; you can
> sort a list of byte strings based upon their corresponding character
> strings using sortBy. This means that a decoding failure only means
> that the ordering will be wrong. This is essentially what happens with
> "ls" if you have filenames which aren't valid in the current locale.]

sortBy could only cope with single-byte encodings.  Multi-byte
encodings would need something else.

> It's broken. Being able to represent filenames as byte strings is
> fundamental. Being able to convert them to or from character strings
> is useful but not essential. The only reason why the existing API
> doesn't cause serious problems is because the translation is currently
> hardwired to an encoding which can't fail.

Handling binary filenames is hardly fundamental.  It isn't even very
portable, see the posts about filename handling under modern Windows.
It might be an important feature, but there are other programs out
there (mostly GUIs) that expect filenames to be encoded according to
the locale settings too.

> By "core library functions", I was referring primarily to libc, not
> the Haskell library functions which were built upon them. The Haskell
> developers can change Haskell, they can't change libc.

And they don't need to change libc.  Libc just passes bytes through.

    Gabriel.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 188 bytes
Desc: not available
Url : http://www.haskell.org//pipermail/haskell-cafe/attachments/20040916/e6c1b76c/attachment.bin


More information about the Haskell-Cafe mailing list