Windows testsuite faliures
Simon Peyton-Jones
simonpj at microsoft.com
Thu Nov 3 10:29:06 CET 2011
Thanks. I've dumped your commentary into http://hackage.haskell.org/trac/ghc/ticket/5599.
Ian: could you make these expect-broken(5599) on msys?
Thanks
| -----Original Message-----
| From: omega.theta at gmail.com [mailto:omega.theta at gmail.com] On Behalf Of Max
| Bolingbroke
| Sent: 02 November 2011 21:36
| To: Simon Peyton-Jones
| Cc: cvs-ghc at haskell.org
| Subject: Re: Windows testsuite faliures
|
| On 2 November 2011 10:32, Simon Peyton-Jones <simonpj at microsoft.com> wrote:
| > Unicode stuff (I assume)
| > lib/IO 3307 [bad exit code] (normal)
| > lib/IO environment001 [bad stdout] (normal)
|
| I've never been able to reproduce this with Cygwin, but I rebuilt GHC
| on msys and managed to find out what is going on. Basically, msys has
| kind of bad Unicode support. If you write a program "len.c" like this:
|
| """
| #include <windows.h>
| #include <stdio.h>
| #include <string.h>
|
| int main(int _argc, char **_argv) {
| LPWSTR cmdLine = GetCommandLineW();
|
| int argc;
| LPWSTR *argv = CommandLineToArgvW(cmdLine, &argc);
|
| printf("%d args, %d wide chars in first arg\n", argc, wcslen(argv[1]));
| return 0;
| }
| """
|
| Create a UTF-8 encoded file called "utf8" containing two characters:
| """
| 不好
| """
|
| And then execute it like so:
| """
| gcc len.c && ./a.exe $(cat utf8)
| """
|
| (NB: it is irrelevant whether you use Cygwin gcc or msys gcc: this is
| an issue with the *shells*)
|
| You get different results on msys and Cygwin:
|
| * On Cygwin, you get 2 wide characters in the first argument -- i.e.
| the UTF-16 encoded Chinese text
| * On msys, you get 6 wide characters in the first argument -- i.e.
| one 16-byte value for *every byte* in the UTF-8 encoded Chinese text
|
| IMHO the msys behaviour is broken because the command line arguments
| supplied via the Windows API are meant to be UTF-16. It *does* match
| the behaviour of Windows cmd if you do this:
|
| """
| set /p myvar= < utf8
| a.exe %myvar%
| """
|
| (You get "6 wide characters" printed)
|
| Perhaps the issue in cmd stems from the fact that the Windows console
| is stuck in code page 850 and doesn't support the UTF-8 "code page".
| But msys really has no excuse since it reports itself as being UTF-8.
|
| I'm not sure what to do here because I don't think our code actually
| has a problem, and the test does pass (and check something useful) in
| Linux, OS X and Cygwin. But still, something is not working quite
| right here. Perhaps just mark it as expect-fail in msys?
|
| Max
More information about the Cvs-ghc
mailing list