[Haskell] System.FilePath survey

Udo Stenzel u.stenzel at web.de
Wed Feb 8 20:01:01 EST 2006


Einar Karttunen wrote:
> On 08.02 14:03, Wolfgang Thaller wrote:
> > 2) Command lines are usually entered as TEXT on a terminal and are  
> > therefore encoded in whatever encoding the terminal uses.
> 
> Actually I like the ablity to delete/copy files even if they
> happen to have filenames in weird chinese encodings too.

Your shell wouldn't know about that.  Either the weird encoding is UTF-8
anyway, in which case there is no problem, or it is something else, in
which case you don't get chinese characters, but gibberish.  The program
copying the gibberish wouldn't care, though.


> Very many people needing to use their own language still use other
> things [than UTF-8] and will continue so for the foreseeable future.

Which is actually a shame.  But anyway, that's the reason why a sane
programming language would use the locale settings to decode the command
line, file names and anything else that came from related system calls.

 
> > Maybe it's possible to use some user-defined  
> > unicode code points to achieve a lossless conversion of arbitrary  
> > byte strings to unicode?

Definitely.  Allocating just 128 code points in the vendor zone
shouldn't be too hard.

> What would happen if you tried to output such a String? The raw
> bytes or the escaped versions?

There are no raw bytes.  Outputting a string means encoding it into
whatever the locale says or whatever the convention of a particular
library mandates.  This will often be the same encoding that was used to
decode filenames in the first place, so you get the same byte sequence
back.  If that happens to be an invalid UTF-8 sequence, so be it.  It
was broken to begin with, so we're no worse off than if we ignored
encoding issues completely.

> Also this would mean that Haskell unicode != unicode

Not at all.  The escape codes wouldn't leave the Haskell program in any
form other than an invalid UTF-8 sequence, which is also the only way
they could ever enter it.  Nobody would ever notice the hack.


Udo.
-- 
Delusions are often functional. A mother's opinions about her children's
beauty, intelligence, goodness, et cetera ad nauseam, keep her from
drowning them at birth.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://www.haskell.org//pipermail/libraries/attachments/20060209/5058617a/attachment.bin


More information about the Libraries mailing list