[Haskell-cafe] Unicoded filenames

Glynn Clements glynn.clements at virgin.net
Wed Sep 15 13:45:26 EDT 2004


Marcin 'Qrczak' Kowalczyk wrote:

> Here is what happens when a language provides only narrow-char API for
> filenames:

> > I have a filename as an UTF-8 encoded string. I need to be able to 
> > handle strange chars like accents, Asian chars etc.
> > 
> > Is there any way to create a file with that name? I only need it on Win32.
> 
> Windows uses UTF-16 for filenames, but provides a non-Unicode interface 
> for legacy applications; the standard open() function that OCaml's 
> open_out wraps appears to use the legacy interface.  The precise 
> codepage this uses is system-dependent, and AFAIK there's no way for a 
> program to determine what it is without calling out to the Win32 API, 
> but you can be pretty sure it won't be UTF-8.
> 
> In other words, there is no reliable way to use a filename containing 
> non-ASCII characters with OCaml's standard library.

No, this is what happens when an API imposes restrictions upon the
filenames which it can handle.

Essentially, it's due to two (or possibly three) factors:

1. The fact that Windows uses wide strings, rather than multi-byte
strings, for filenames.

2. The fact that Windows' compatibility interface is broken, i.e. it
only lets you access filenames which can be represented in the current
codepage (which, to me, is highly analogous to only supporting
filenames which are valid in the current locale).

3. Possibly that OCaml insists upon using UTF-8. [I don't know that
this is the case, but the fact that they specifically mention UTF-8
suggests that it might be.]

IOW, this incident seems to oppose, rather than support, the
filenames-as-characters viewpoint.

-- 
Glynn Clements <glynn.clements at virgin.net>


More information about the Haskell-Cafe mailing list