behaviour change in getDirectoryContents in GHC 7.2?

Max Bolingbroke batterseapower at hotmail.com
Wed Nov 2 14:53:46 CET 2011


On 2 November 2011 09:37, Max Bolingbroke <batterseapower at hotmail.com> wrote:
> On 1 November 2011 20:13, John Millikin <jmillikin at gmail.com> wrote:
>> $ ghci-7.2.1
>> GHC> import System.Directory
>> GHC> getDirectoryContents "path-test"
>> ["\161\165","\61345\61349","..","."]
>> GHC> readFile "path-test/\161\165"
>> "world\n"
>> GHC> readFile "path-test/\61345\61349"
>> *** Exception: path-test/: openFile: does not exist (No such file or
>> directory)
>
> Thanks for the example! I can reproduce this on Linux (haven't tried
> OS X or Windows) and AFAICT this behaviour is just a straight-up bug
> and is *not* intended behaviour. I'm not sure why the tests aren't
> catching it.

I've tracked it down and this bug arises in the following situation:
 1. You are not running on Windows
 2. You are attempting to encode a string containing the private-use
escape codepoints
 3. You are using an iconv (such as the one in GNU libc) that, in
contravention of the Unicode standard, does not signal EILSEQ if
surrogate codepoints are encountered in a non-UTF16 input

I've got a patch that will work around the issue in most situations by
avoiding the iconv code path. With the patch everything will work OK
as long as the system locale is one that we have a native-Haskell
decoder for (i.e. basically UTF-8). So you will still be able to get
the broken behaviour if the above 3 conditions are met AND your system
locale is not UTF-8.

I think the only way to fix this last case in general is to fix iconv
itself, so I'm going to see if I can get a patch upstream. Fixing it
for people with UTF-8 locales should be enough for 99% of users,
though.

Max



More information about the Glasgow-haskell-users mailing list