[Haskell] System.FilePath survey

Einar Karttunen ekarttun at cs.helsinki.fi
Wed Feb 8 18:39:37 EST 2006


On 08.02 14:03, Wolfgang Thaller wrote:
> 1) Widely used languages and libraries like Java and GTK+ assume that  
> all file names and command lines are encoded in the system locale, or  
> at least that they can all be converted to unicode strings.

Which causes much annoyance to users having to define various
environment variables just to get them to open a file.

> 2) Command lines are usually entered as TEXT on a terminal and are  
> therefore encoded in whatever encoding the terminal uses.

Actually I like the ablity to delete/copy files even if they
happen to have filenames in weird chinese encodings too.
Users just use wildcards or tab completion to get around
filenames that are hard to type.

> 3) None of the recent linux distributions I have installed did  
> anything but set up a UTF-8 based system.

Very many people needing to use their own language still use
other things and will continue so for the foreseeable future.

> So I think we should try hard to avoid introducing any additional  
> complexity, like filename ADTs used for program arguments, to deal  
> with the small minority of systems where file names cannot be  
> converted to unicode. Maybe it's possible to use some user-defined  
> unicode code points to achieve a lossless conversion of arbitrary  
> byte strings to unicode? I mean, byte strings that are valid in the  
> system encoding would get transcoded correctly, and invalid bytes  
> would get mapped to some extra code points so that they can be  
> converted back if necessary.

What would happen if you tried to output such a String? The raw
bytes or the escaped versions? Also this would mean that 
Haskell unicode != unicode (isn't Java's broken handling
enough).

- Einar Karttunen


More information about the Libraries mailing list