apfelmus' interface degustation - System.FilePath.Find

Thu Jun 28 17:39:47 EDT 2007

> Bryan O'Sullivan wrote:
>> apfelmus wrote:
>
> Thank you for the review and constructive criticism.
> Although highly amusing, some of what you write is elliptical
> enough in style that I'm having trouble following a few of your points.
> Doubly so since I've thus far gone years without paying attention to
> Control.Applicative

Oui, you're right. I'm going to elaborate, although critique
incompréhensible is a must, otherwise it would quickly become obvious
that the critic has absolutely no clue ;)

>> The RecursionPredicate decides whether to
>> recurse into or a sub-directory or not (it could be mentioned explicitly
>> that the predicate is only invoked on directories).
>
> Good Point!

>> Cette soupe is much too monadic! In particular, we have the isomorphism
>> 
>>   (FileInfo -> a) ≅ FindClause a
>> 
>> witnessed by the function pair
>> 
>>   (f,g) = ((`liftM` fileInfo), evalClause)
>> 
>> Of course, the encapsulation was made to allow a formulation like
>> 
>>   extension ==? ".hs"
>> 
>> which does not mention the FileInfo parameter. But alas, this can also
>> be achieved more naturally by appealing to the
>> 
>>   instance Control.Applicative ((->) a)

The intention is to present the predicates (FileInfo -> a) in a way that
does not mention the parameter over and over. I mean, one could write
the example as

  \info -> (extension info == ".hs") || (extension info == ".lhs")

but threading the parameter around will quickly become a nuisance with
more complex queries. One solution is to use the Reader monad, and in
essence that's what System.FilePath.Find currently does. But I think that
 a) The Reader is better seen as applicative functor than as monad
 b) Making (FindClause a) a mere type synonym of (FileInfo -> a)
    has the benefit that the user can choose whether he wants to use
    monads or applicative functors via the respective instances
    or whether he does not.

Programming with applicative functors is like programming with monads
but you're only allowed to use  return  and `ap`. The example can be
written for monads as

 return (||)
  `ap` (return (== ".hs") `ap` extension)
  `ap` (return (== ".lhs") `ap` extension)

or for general applicative functors as

 (||) <$> ((== ".hs") <$> extension) <*> ((== ".lhs") <$> extension)

In the end, one will probably use the custom combinators (==?) and (||?)
anyway, so that it doesn't matter whether it's monad or applicative or
whatever. But not making (FindClause a) opaque gives more freedom to the
user.

Note that providing functionality by making things instances of very
powerful classes should be documented explicitly. A recent question on
haskell-cafe was about Data.Tree and one can argue that the innocent
looking instances for Foldable, Functor and Traversable are maybe 75% of
the functionality that Data.Tree offers :)

>> Abstracting the concrete representation away into FindClause hinders
>> reuse of already existing functionality from System.FilePath,
>
> Yes, that's unfortunate.

>> The monad could make sense if the predicate might involve not only the
>> file status but also looking inside the file.
>
> How would you suggest doing so? Just a simple unsafeInterleaveIO
> readFile?
>
>> Returning all files with a
>> certain magic number would be such a use case but System.FilePath.Find
>> currently does not offer zis possibilité.
> 
> I'm certainly all for improving it, so this kind of suggestion is most welcome.

I guess that using (FileInfo -> IO Bool) and providing a specialized
(FileInfo -> Bool) version is probably a safe way. But with IO, the
traversal could even change files in place, which is probably not a good
idea.

Maybe unsafePerformIO is the best solution, because you may safely close
the file after having evaluated

  predicate (unsafePerformIO $ hGetContents handle) (fileinfo)

to weak head normal form, i.e. True or False. I think it fits perfectly.

>> Un autre point is that the library offers a function  fold  that is
>> almost a foldl' over all files in a directory tree, but not quite: here,
>> the fold goes over FileInfos but the list is a simple list of FilePaths.
>> One has to foldM over the latter and get the file infos again to achieve
>> the same effect.
> 
> But the FilePath is embedded in a FileInfo, so a double traversal is unneeded.

Yes. I've not been clear, I mean the other way round. Using
System.FilePath.Find.fold gives you both file status and file path but
the ought-to-be equivalent approach of using foldl' on the list returned
by find only gives the file path but no file status. So, the suggestion
is to make find return a list of FileInfo

  find :: ... -> IO [FileInfo]

If you only want a list of file paths, you can always do

  liftM (map filepath) . find ...

Of course, this leads to the question whether  find  should be factored
even more into generate & prune

  find r p dir = map filepath . filter p . directoryWalk r dir

with the intention that System.FilePath.Find only exports directoryWalk.
Which leads to the question whether directoryWalk can be factored as
well which in turn leads to the next question:

>> Le maître de cuisine thinks that these quirks stem from the fact that
>> the file system lacks a purely functional design. The observation is
>> that given persistence, reading does not need to be monadic. In other
>> words, directory tree traversal might well return a list of files
>> 
>>   data File = File { contents :: String, status :: FileStatus }
>> 
>> with contents and status. More generally, the traversal could be a
>> side-effect-free traversal of a pure data structure.
>
> The darcs authors already tried this, and gave up on the idea.
> Once you have a pure data structure, you start developing notions that
> it makes sense to manipulate it, and then all is lost once you turn
> your mind to applying those manipulations to the real world.

Yes, the data structure ought to be read-only. But by making it an
abstract data type with proper access functions, maybe this goal can be
achieved. In any case, with such a data structure,  directoryWalk  can
be factored as well.

In particular, I have something in mind, based on the following
 Question: "Given a directory tree, what are you going to do with it?".
 Answers : "Well, I rename every file" (=> map)
           "Calculate total size" (=> fold)
The assumption is that everything you'll ever do with a huge directory
tree is to map or fold it. So, here comes my crazy speculation: make the
directory tree a phantom type

  data DirTree a
  type DirectoryTree = DirTree FileInfo

and implement Functor, Foldable and Traversable for it. So, printing out
all files would become

  Data.Foldable.mapM_ (putStrLn . filepath) :: DirTree FileInfo -> IO ()

In conclusion, there are several ways to generalize an interface. One
way is to add more options and parameters to a function. But the other
way is to shatter a monolithic function into tiny pieces that can be
reassembled and composed at will. I think that the latter is the spirit
and the source of power of functional programming.

Regards,
apfelmus