patch applied (cabal): Make UTF-8 decoding errors in .cabal files non-fatal

Ross Paterson ross at soi.city.ac.uk
Thu Mar 27 12:03:33 EDT 2008


On Thu, Mar 27, 2008 at 08:45:29AM -0700, Duncan Coutts wrote:
> Wed Mar 26 20:17:40 PDT 2008  Duncan Coutts <duncan at haskell.org>
>   * Make UTF-8 decoding errors in .cabal files non-fatal
>   Previously we checked for invalid UTF-8 in the first phase of the
>   parser, which splitting the file up into nested sections and fields.
>   This meant the whole parser falls over if there is invalid UTF-8
>   anywhere in the file. Sadly there are already packages on hackage
>   with invalid UTF-8 so we would fail when parsing the hackage index.
>   The solution is to move the check into the parsing of the individual
>   fields and making it a warning not an error. We most typically get
>   invalid UTF-8 in free text fields like author name, copyright,
>   description etc so this should work out ok usually.
>   We now get pretty decent error messages, like:
>     Warning: hsx.cabal:5: Invalid UTF-8 text in the 'author' field.
>   The warning type is now structured so that hackage will be able to
>   distinguish general non-fatal warnings from UTF-8 decoding problems
>   which really should be fatal errors for package uploads. 

These invalid UTF-8 strings are usually valid Latin-1 in people's names,
which the web interface needs to show.

So would it be possible give the warning, but either to treat bytes
that comprise an encoding error as Latin-1 Chars, or to reparse a string
(or file) with UTF errors as a Latin-1 string?  In almost all cases, the
problematic sequence is a single non-ASCII byte surrounded by ASCII bytes.



More information about the cabal-devel mailing list