[Haskell-cafe] Core packages and locale support

Roman Cheplyaka roma at ro-che.info
Fri Jun 25 17:56:30 EDT 2010


* Brandon S Allbery KF8NH <allbery at ece.cmu.edu> [2010-06-25 05:00:08-0400]
> On 6/25/10 02:42 , Roman Cheplyaka wrote:
> > * Jason Dagit <dagit at codersbase.com> [2010-06-24 20:52:03-0700]
> >> On Sat, Jun 19, 2010 at 1:06 AM, Roman Cheplyaka <roma at ro-che.info> wrote:
> >>> While ghc 6.12 finally has proper locale support, core packages (such as
> >>> unix) still use withCString and therefore work incorrectly when argument
> >>> (e.g. file path) is not ASCII.
> >>
> >> Pardon me if I'm misunderstanding withCString, but my understanding of unix
> >> paths is that they are to be treated as strings of bytes.  That is, unlike
> >> windows, they do not have an encoding predefined.  Furthermore, you could
> >> have two filepaths in the same directory with different encodings due to
> >> this.
> > 
> > you got everything right here. So, as you said, there is a mismatch
> > between representation in Haskell (list of code points) and
> > representation in the operating system (list of bytes), so we need to
> > know the encoding. Encoding is supplied by the user via locale
> > (https://secure.wikimedia.org/wikipedia/en/wiki/Locale), particularly
> > LC_CTYPE variable.
> 
> You might want to look at how Python is dealing with this (including the
> pain involved; best to learn from example).

Do you mean the pain when filenames can not be decoded using current
locale settings and thus the files are not accessible? (The same about
environment variables.)

Agreed, it's unpleasant. The other way would be changing [Char] to [Word8]
or ByteString. But this would a) break all existing programs and b) be
an OS-specific hack. Crap.

Brandon, do you have any ideas on how we should proceed with this?

-- 
Roman I. Cheplyaka :: http://ro-che.info/
"Don't let school get in the way of your education." - Mark Twain


More information about the Haskell-Cafe mailing list