announcing darcs

Alastair Reid alastair@reid-consulting-uk.ltd.uk
Thu, 10 Apr 2003 11:15:41 +0100


Alastair Reid <alastair@reid-consulting-uk.ltd.uk> writes:
>> [moved from cafe to libraries]
>>
>> For example, on a Unix system, /usr/lib/libcurl.so would be treated
>> something like this:
>>
>> (Just ["/","usr","lib"], "libcurl", Just "so")

Ketil Z Malde <ketil@ii.uib.no> writes:
> Isn't this a SMOP, writing functions:
>
>   dirname  :: FilePath -> String  -- or FilePath?
>   basename :: FilePath -> String
>   suffix   :: FilePath -> String

SMOP == small matter of programming?

Yes, it's pretty easy to do.  But that small matter of programming
gets repeated time and time again (with many shortcuts taken which
limit portability or make incorrect assumptions about what are legal
filenames) so I suggest that a high quality library we added.

I'm sure your functions weren't intended as a final, polished API
(though they look like the GNU make filename API which, since it is
now set in concrete, is as final and polished as it is ever likely to
get) but I'll point out some of the issues in the set of functions you
suggest.

1) What should the functions return when there is no dirname, no
   basename or no suffix.  An empty string suggests itself but can we
   then still distinguish between filenames like "foo." and "foo",
   "/foo" and "foo"?  

   This is why I used 'Maybe' - though maybe I didn't use it enough in
   my sketch?

2) It's often enough to split the dirname from the basename as you
   suggest but I sometimes find myself needing to access a
   subdirectory or parent directory.  So I write code like:

     dirname f ++ "/" ++ subdirname ++ "/" ++ notdir f

   or the cryptic

     reverse (takeWhile (/= '/') (reverse (dirname f))) ++ notdir f

   Both are fixed if there's a way to split the dirname into a list
   of directories so that we can add or remove bits at will.

3) We need a way to glue the various components back together again to
   eliminate those non-portable uses of '++ "/" ++' above.  

   The obvious thing is to abstract the directory separator (typically
   '/' or '\') but then you have to be careful when adding or removing
   components from filenames that are relative or absolute, have or
   lack a dirname, have or lack a suffix, etc.  

   I forget all the details of Windows filenames but you may also need
   to be careful when dealing with Windows drive letters and SMB mounted
   files on Windows.

   This is, in part, why I suggesting that there be a way to parse
   FilePaths into a richer structure.  My thought was that as well as
   having operations to access the components, there would also be
   operations to modify the components (cf. record updates) - the idea
   being that if you want to change the suffix, you don't have to
   figure out all the things you want to remain constant, you just
   have to figure out the things you want to change.

   (The other reason for suggesting what the internal structure would
   be comes from my background in algebraic specification.  Given a
   structure which is semantically equivalent to a tuple (as I believe
   filenames ought to be viewed), we can just say it is equivalent to
   a tuple (a model-based specification) or we can give a set of
   equations in the algebraic specification style.  My experience is
   that, in this case, the model-based style scales better (i.e., is
   shorter) and is easier to understand (because it exploits existing
   understanding/intuition).)


--
Alastair Reid                 alastair@reid-consulting-uk.ltd.uk  
Reid Consulting (UK) Limited  http://www.reid-consulting-uk.ltd.uk/alastair/