[Haskell-cafe] ANNOUNCE: system-filepath 0.4.5 and system-fileio 0.3.4

Joey Hess joey at kitenet.net
Mon Feb 6 19:05:13 CET 2012


John Millikin wrote:
> That was my understanding also, then QuickCheck found a
> counter-example. It turns out that there are cases where a valid path
> cannot be roundtripped in the GHC 7.2 encoding.

> The issue is that  [238,189,178] decodes to 0xEF72, which is within
> the 0xEF00-0xEFFF range that GHC uses to represent un-decodable bytes.

How did you deal with this in system-filepath?

While no code points in the Supplementary Special-purpose Plane are currently
assigned (http://www.unicode.org/roadmaps/ssp/), it is worrying that it's used,
especially if filenames in a non-unicode encoding could be interpreted as
containing characters really within this plane. I wonder why maxBound :: Char
was not increased, and the addtional space after `\1114111' used for the
un-decodable bytes?

> > For FFI, anything that deals with a FilePath should use this
> > withFilePath, which GHC contains but doesn't export(?), rather than the
> > old withCString or withCAString:
> >
> > import GHC.IO.Encoding (getFileSystemEncoding)
> > import GHC.Foreign as GHC
> >
> > withFilePath :: FilePath -> (CString -> IO a) -> IO a
> > withFilePath fp f = getFileSystemEncoding >>= \enc -> GHC.withCString enc fp f
> 
> If code uses either withFilePort or withCString, then the filenames
                      withFilePath?
> written will depend on the user's locale. This is wrong. Filenames are
> either non-encoded text strings (Windows), UTF8 (OSX), or arbitrary
> bytes (non-OSX POSIX). They must not change depending on the locale.

This is exactly how GHC 7.4 handles them. For example:

openDirStream :: FilePath -> IO DirStream
openDirStream name =
  withFilePath name $ \s -> do
    dirp <- throwErrnoPathIfNullRetry "openDirStream" name $ c_opendir s
    return (DirStream dirp)

removeLink :: FilePath -> IO ()
removeLink name =
  withFilePath name $ \s ->
  throwErrnoPathIfMinus1_ "removeLink" name (c_unlink s)

I do not see any locale-dependant behavior in the filename bytes read/written.

> > Code that reads or writes a FilePath to a Handle (including even to
> > stdout!) must take care to set the right encoding too:
> >
> > fileEncoding :: Handle -> IO ()
> > fileEncoding h = hSetEncoding h =<< getFileSystemEncoding
> 
> This is also wrong. A "file path" cannot be written to a handle with
> any hope of correct behavior. If it's to be displayed to the user, a
> path should be converted to text first, then displayed.

Sure it can. See find(1). Its output can be read as FilePaths once the
Handle is set up as above.

If you prefer your program not crash with an encoding error when an
arbitrary FilePath is putStr, but instead perhaps output bytes that are
not valid in the current encoding, that's also a valid choice. You might
be writing a program, like find, that again needs to output any possible
FilePath including badly encoded ones.

Filesystem.Path.CurrentOS.toText is a nice option if you want validly
encoded output though. Thanks for that!

> This is new in 7.4, and won't be backported, right? I tried compiling
> the new "unix" package in 7.2 to get proper file path support, but it
> failed with an error about some new language extension.

The RawFilePath is just a ByteString, so your existing converters for that
in system-filepath might work.

-- 
see shy jo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 828 bytes
Desc: Digital signature
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20120206/ddd0a989/attachment.pgp>


More information about the Haskell-Cafe mailing list