[GHC] #5436: text decoding doesn't use recover on eof

GHC cvs-ghc at haskell.org
Sun Aug 28 20:40:37 CEST 2011


#5436: text decoding doesn't use recover on eof
---------------------------------+------------------------------------------
    Reporter:  judahj            |       Owner:              
        Type:  bug               |      Status:  new         
    Priority:  normal            |   Component:  Compiler    
     Version:  7.2.1             |    Keywords:              
    Testcase:                    |   Blockedby:              
          Os:  Unknown/Multiple  |    Blocking:              
Architecture:  Unknown/Multiple  |     Failure:  None/Unknown
---------------------------------+------------------------------------------
 ghc-7.2.1 provides a way for `TextEncodings` to recover from decoding
 errors.  However, that functionality does not work for incomplete byte
 sequences at the end of a file; in that case, it throws an error
 regardless of the recovery function.  This is a problem since it makes it
 difficult to ensure that a program won't throw an exception on bad input.

 Reproduction steps:
 {{{
 ghc --make GetChar.hs
 ghc -e "Data.ByteString.hPut System.IO.stdout (Data.ByteString.pack
 [200])" | ./GetChar
 }}}
 where `GetChar.hs` is the following module:

 {{{

 {-# LANGUAGE RecordWildCards #-}
 ./GetChar
 module Main where

 import System.IO
 import GHC.IO.Encoding
 import GHC.IO.Encoding.Failure

 main = do
     mkRecoveringLocaleEncoding "UTF-8" >>= hSetEncoding stdin
     getChar >>= print

 mkRecoveringLocaleEncoding :: String -> IO TextEncoding
 mkRecoveringLocaleEncoding name = do
     enc <- mkTextEncoding name
     return $ case enc of
         TextEncoding {..} -> TextEncoding {
                 mkTextDecoder = fmap (setRecover $ recoverDecode
 TransliterateCodingFailure)
                                     mkTextDecoder,
                 mkTextEncoder = fmap (setRecover $ recoverEncode
 TransliterateCodingFailure)
                                     mkTextEncoder,..
             }
   where
     setRecover r x = x { recover = r }
 }}}

 Result:
 {{{
 GetChar: <stdin>: hGetChar: invalid argument (invalid byte sequence for
 this encoding)
 }}}

 In the course of investigating the issue, I found the following comment
 near the definition of GHC.IO.Handle.streamEncode:
 {{{
 -- FIXME: we should use recover to deal with EOF, rather than always
 throwing an
 -- IOException (ioe_invalidCharacter).
 }}}
 So I guess this ticket records my vote to fix that problem.

-- 
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/5436>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler



More information about the Glasgow-haskell-bugs mailing list