[Haskell] System.FilePath survey

John Meacham john at repetae.net
Wed Feb 8 18:50:09 EST 2006


On Wed, Feb 08, 2006 at 09:10:37PM +0000, Ben Rudiak-Gould wrote:
> John Meacham wrote:
> >On Tue, Feb 07, 2006 at 04:25:35PM +0000, Ben Rudiak-Gould wrote:
> >>                 Posix       NT             Win9x
> >>
> >>pathnames        bytes       UTF-16         locale
> >>command line     bytes       UTF-16         locale
> >>file contents    bytes       bytes          bytes
> >>pipes/sockets    bytes       bytes          bytes
> >
> >actually, Posix systems should be the following
> >
> >>pathnames        locale       UTF-16         locale
> >>command line     locale       UTF-16         locale
> >>file contents    *            bytes          bytes
> >>pipes/sockets    *            bytes          bytes
> >
> >Although the Posix interface is in terms of bytes, the strings should
> >always be interpreted via the locale specified in $LANG or $LC_CTYPE
> >also, for file contents and pipes/sockets, if you are passing text, and
> >in the absence of some overriding standard or protocol, you should be
> >using the encoding specified in the locale too.
> 
> But that's an application-level convention; the kernel only knows about 
> bytes. On Windows the encoding of pathnames and the command line is a 
> requirement imposed by the kernel. I think assuming the locale encoding for 
> the command line on Posix is a bad idea. Users are unlikely to pass a 
> misencoded command line explicitly, but I want my-haskell-util `find .` to 
> work even on a mounted volume that uses the wrong encoding. (And I also 
> want your-haskell-util to work, even if you didn't write it with this 
> situation in mind.)

when the command line is to be interpreted as a string, then
interpreting it in the current locale is definitly the right thing to
do. This is why we need two varieties of getArgs, one which returns
[String] and one which returns [[Word8]]. though, I doubt the second
form will be needed much since in general you usually think of command
line arguments as strings, but it should be provided since it can't
really be worked around.
        John

-- 
John Meacham - ⑆repetae.net⑆john⑈


More information about the Libraries mailing list