[Haskell-cafe] invalid character encoding

Ian Lynagh igloo at earth.li
Sat Mar 19 14:14:25 EST 2005


On Wed, Mar 16, 2005 at 11:55:18AM +0000, Ross Paterson wrote:
> On Wed, Mar 16, 2005 at 03:54:19AM +0000, Ian Lynagh wrote:
> > Do you have a list of functions which behave differently in the new
> > release to how they did in the previous release?
> > (I'm not interested in changes that will affect only whether something
> > compiles, not how it behaves given it compiles both before and after).
> 
> I got lost in the negatives here.  It affects all Haskell 98 primitives
> that do character I/O, or that exchange C strings with the C library.

In the below, it looks like there is a bug in getDirectoryContents.

Also, the error from w.hs is going to stdout, not stderr.

Most importantly, though: is there any way to remove this file without
doing something like an FFI import of unlink?

Is there anything LC_CTYPE can be set to that will act like C/POSIX but
accept 8-bit bytes as chars too?


(in the POSIX locale)
$ echo 'import Directory; main = getDirectoryContents "." >>= print' > q.hs
$ runhugs q.hs 
[".","..","q.hs"]
$ touch 1`printf "\xA2"`
$ runhugs q.hs
runhugs: Error occurred

ERROR - Garbage collection fails to reclaim sufficient space


$ echo 'import Directory; main = removeFile "1\xA2"' > w.hs
$ runhugs w.hs

Program error: 1?: Directory.removeFile: does not exist (file does not exist)
$ strace -o strace.out runhugs w.hs > /dev/null
$ grep unlink strace.out | head -c 14 | hexdump -C
00000000  75 6e 6c 69 6e 6b 28 22  31 3f 22 29 20 20        |unlink("1?")  |
0000000e
$ strace -o strace2.out rm 1*
$ grep unlink strace2.out | head -c 14 | hexdump -C
00000000  75 6e 6c 69 6e 6b 28 22  31 a2 22 29 20 20        |unlink("1.")  |
0000000e
$ 



Now consider this e.hs:

--------------------
import IO

main = do hWaitForInput stdin 10000
          putStrLn "Input is ready"
          r <- hReady stdin
          print r
          c <- hGetChar stdin
          print c
          putStrLn "Done!"
--------------------

$ { printf "\xC2\xC2\xC2\xC2\xC2\xC2\xC2"; sleep 30; } | runhugs e.hs
Input is ready
True

Program error: <stdin>: IO.hGetChar: protocol error (invalid character encoding)
$ 

It takes 30 seconds for this error to be printed. This shows two issues:
First of all, I think you should be giving an error as soon as you have
a prefix that is the start of no character. Second, hReady now only
guarantees hGetChar won't block on a binary mode handle, but I guess
there is not much we can do except document that (short of some hideous
hacks).


Thanks
Ian



More information about the Haskell-Cafe mailing list