<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#ffffff" text="#000000">

    <br>

    <br>

    On 25.06.10 20:09, Jason Dagit wrote:

    <blockquote

      cite="mid:AANLkTik_s8rr23Lgq7-olobiTvChxfNLX3c7i0j18YVw@mail.gmail.com"

      type="cite"><br>

      <div class="gmail_quote">

        <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt

          0.8ex; border-left: 1px solid rgb(204, 204, 204);

          padding-left: 1ex;">

          you got everything right here. So, as you said, there is a

          mismatch<br>

          between representation in Haskell (list of code points) and<br>

          representation in the operating system (list of bytes), so we

          need to<br>

          know the encoding. Encoding is supplied by the user via locale<br>

          (<a moz-do-not-send="true"

            href="https://secure.wikimedia.org/wikipedia/en/wiki/Locale"

            target="_blank">https://secure.wikimedia.org/wikipedia/en/wiki/Locale</a>),

          particularly<br>

          LC_CTYPE variable.<br>

          <br>

          The problem with encodings is not new -- it was already solved

          e.g. for<br>

          input/output.<br>

        </blockquote>

        <div><br>

        </div>

        <div>This is the part where I don't understand the problem well.

          &nbsp;I thought that with IO the program assumes the locale of the

          environment but that with filepaths you don't know what locale

          (more specifically which encoding) they were created with. &nbsp;So

          if you try to treat them as having the locale of the current

          environment you run the risk of misunderstanding their

          encoding.</div>

      </div>

      <br>

    </blockquote>

    Incorrect encoding of filepaths is common in e.g. Cyrillic Linux

    (because of multiple possible encodings &#8212; CP1251, KOI8-R, UTF-8) and

    is solved by fiddling with the current locale and media mount

    options. No need to change a program, or to tell character encoding

    to a program. It is not a programming language issue.<br>

    <pre class="moz-signature" cols="72">-- 

Best regards,

  Roman Beslik.

</pre>

  </body>

</html>