<br><div class="gmail_quote">On Wed, Mar 30, 2011 at 09:26, Jason Dagit <span dir="ltr">&lt;<a href="mailto:dagitj@gmail.com" target="_blank">dagitj@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<br><br><div class="gmail_quote"><div>On Tue, Mar 29, 2011 at 11:52 PM, Michael Snoyman <span dir="ltr">&lt;<a href="mailto:michael@snoyman.com" target="_blank">michael@snoyman.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


Hi all,<br>

<br>

I think this is a well-known issue: it seems that there is no<br>

character decoding performed on the values returned from the functions<br>

in System.Directory (getDirectoryContents specifically). I could<br>

manually do something like (utf8Decode . S8.pack), but that presumes<br>

that the character encoding on the system in question is UTF8. So two<br>

questions:<br>

<br>

* Is there a package out there that handles all the gory details for<br>

me automatically, and simply returns a properly decoded String (or<br>

Text)?<br>

* If not, is there a standard way to determine the character encoding<br>

used by the filesystem, short of hard-coding in character encodings<br>

used by the major ones?<br></blockquote><div><br></div></div><div>I started to write a thoughtful reply, but I found that the answers here sum up everything I was going to say:</div><div><a href="http://unix.stackexchange.com/questions/2089/what-charset-encoding-is-used-for-filenames-and-paths-on-linux" target="_blank">http://unix.stackexchange.com/questions/2089/what-charset-encoding-is-used-for-filenames-and-paths-on-linux</a></div>


<div><br></div><div>This same issue comes up from time to time for darcs and, if I recall correctly, the solution has been to treat unix file paths as arbitrary bytes whenever possible and to escape non-ascii compatible bytes when they occur.  Otherwise it can be hard to encode them in textual patch descriptions or xml (where an encoding is required and I believe utf8 is a standard default).</div>


<div><br></div><div>I wish you luck.  It&#39;s not as easy problem, at least on unix.  I&#39;ve heard that windows has a much easier time here as MS has provided a standard for it.</div></div></blockquote><div><br></div>


<div>All the more reason it seems to make this available in the standard package, so people don&#39;t have to figure out how to the conversions each time (for all the different OSes with whcih they might not have any experience etc) .</div>


<div><br></div><div>All modern Linuxes use UTF8 by default anyway so in the beginning one could assume UTF8 and later change the system to be able to make more intelligent decisions (like checking environment variables for per-user settings). A way to override the assumptions made would be necessary too I guess.</div>


<div><br></div><div>-Tako</div><div><br></div></div>