Plan for file processing libs

Benjamin Franksen benjamin.franksen at bessy.de
Sun Nov 26 17:34:57 EST 2006


Bulat Ziganshin wrote:
> Sunday, November 26, 2006, 5:44:55 PM, you wrote:
>>> * FilePath operations implemented via Stringable class
> 
>> Except that for POSIX systems Stringable isn't the appropriate class for
>> FilePath operations, but rather ByteString.
> 
> oh, sorry, i considered ByteString here as a sort of string, but unix
> filename is just a byte sequence. nevertheless, in order to implement
> operations of FilePath module, we should rely on some *encoding*. if
> some unixes may use encoding other than utf8/latin1/other ascii-compatible
> we have a problem od recognizing this encoding. one cannot extract,
> for example, basename without knowing encoding of '.' and '/'

ASCII, of course. The problem is, in Unix any sequence of bytes is allowed
as a file name, except '/' (reserved as directory separator) and 0 (zero,
reserved as final terminator). Thus you can have filenames which are not
valid in most encodings. You can use latin1 but this may be misleading as
other filenames may be intended to be interpreted as utf8. It is all very
bad. In principle, encoding can change from one directory to the next, or
even from one file to the next in the same directory. You can play funny
jokes with people you work with by giving them files that contain backspace
characters and similar oddities...

Ben



More information about the Libraries mailing list