[Haskell-cafe] Encoding-aware System.Directory functions

Bas van Dijk v.dijk.bas at gmail.com
Wed Mar 30 21:18:48 CEST 2011


On 30 March 2011 18:07, Michael Snoyman <michael at snoyman.com> wrote:
> On Wed, Mar 30, 2011 at 9:26 AM, Jason Dagit <dagitj at gmail.com> wrote:
>>
>>
>> On Tue, Mar 29, 2011 at 11:52 PM, Michael Snoyman <michael at snoyman.com>
>> wrote:
>>>
>>> Hi all,
>>>
>>> I think this is a well-known issue: it seems that there is no
>>> character decoding performed on the values returned from the functions
>>> in System.Directory (getDirectoryContents specifically). I could
>>> manually do something like (utf8Decode . S8.pack), but that presumes
>>> that the character encoding on the system in question is UTF8. So two
>>> questions:
>>>
>>> * Is there a package out there that handles all the gory details for
>>> me automatically, and simply returns a properly decoded String (or
>>> Text)?
>>> * If not, is there a standard way to determine the character encoding
>>> used by the filesystem, short of hard-coding in character encodings
>>> used by the major ones?
>>
>> I started to write a thoughtful reply, but I found that the answers here sum
>> up everything I was going to say:
>> http://unix.stackexchange.com/questions/2089/what-charset-encoding-is-used-for-filenames-and-paths-on-linux
>> This same issue comes up from time to time for darcs and, if I recall
>> correctly, the solution has been to treat unix file paths as arbitrary bytes
>> whenever possible and to escape non-ascii compatible bytes when they occur.
>>  Otherwise it can be hard to encode them in textual patch descriptions or
>> xml (where an encoding is required and I believe utf8 is a standard
>> default).
>> I wish you luck.  It's not as easy problem, at least on unix.  I've heard
>> that windows has a much easier time here as MS has provided a standard for
>> it.
>> Jason
>
> Thanks to you (and everyone else) for the informative responses. For
> now, I've simply hard-coded in UTF-8 encoding for all non-Windows
> systems. I'm not sure how this will play with OSes besides Windows and
> Linux (especially Mac), but it's a good stop-gap measure.
>
> I *do* think it would be incredibly useful to provide alternatives to
> all the standard operations on FilePath which used opaque datatypes
> and properly handles filename encoding. I noticed John Millikin's
> system-filepath package[1]. Do people have experience with it? It
> seems that adding a few functions like getDirectoryContents, plus
> adding a version of toString which performs some character decoding,
> would get us pretty far.
>
> Michael
>
> [1] http://hackage.haskell.org/package/system-filepath
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>

It would also be great to have a package which combines the proper
encoding/decoding of filepaths of the system-filepath package with the
type-safety of the pathtype package:
http://hackage.haskell.org/package/pathtype

Bas



More information about the Haskell-Cafe mailing list