Proposals for changes to searching behaviour

Tue, 10 Dec 2002 14:49:29 -0500

Hi Simon,
CC Glasgow-Haskell-Users

> I've never quite managed to figure out what VPATH is for; one reason is
> that I've never got it to cooperate well with automatically-generated
> dependencies.

As long as one uses automatic variables in all commands, it seems to
work fine. Well, I have at least not had any trouble in that particular
respect, for what that is worth.

One application of VPATH is to select between different sources depending
on the platform. Sure, there are alternatives, but VPATH can be quite
convenient, and if used properly, the overall source structure is quite
clear and tidy.

For example, the current (unfortunate) "standard" practice for dealing
with StdDIS for GreenCard, is to bundle it with every library
application/libray which has some GreenCard sources. If you want your
app/library to work with more than one Haskell system, you need
one StdDIS for each. Alastair Reids Xlib is a good example. It roughly
has this structure:

    src
	<Haskell system independent Haskell & GreenCard sources>
        ghc
            StdDIS.gc	-- StdDIS for GHC
        hugs
            StdDIS.gc   -- StdDIS for Hugs

Ignore that there in an ideal world should not be any reason to
to bundle StdDIS with apps/libs: this is just an example of a source
file that is different for different Haskell systems. AS long as Haskell
systems provides different features, there is always going to be a need
for this kind of arrangement. One could imagine wanting a heavily
optimized version of one module for GHC (using unboxed whatever etc.),
as well as plain Haskell 98 sources for other systems, for example.

> If there's a specific restriction in GHC that prevents
> the use of VPATH, could you describe what it is?

No real restriction. But consider what will happen if we try to use
the above scheme for some module A.B.C.M1 which is part of a module A.B.C.

If I have to use a directory hierarchy, I'd find it natural to organize
the sources like this:

(A)

    src
        A
            B
                C.hs
                C
		    ghc
			M1.hs
                    hugs
                        M1.hs

But instead, it seems as if I have to do it like this:

(B)

    src
        A
            B
                C.hs
        ghc
            A
                B
                    C
                        M1.hs
        hugs
            A
                B
                    C
                        M1.hs

I'm sorry, but as far as I'm concerned, this really is a mess.

As an aside and an example of how different tool conventions can interfere,
suppose our sources are under CVS control, and that we want to move things
around a bit. Renaming files is bad enough in CVS, but changing the actual
source hirarchy is really painful.

All I really want is

(C)

    src
        C.hs
        ghc
            C.M1.hs
        hugs
            C.M1.hs

or possibly, if prefixes can't be dropped:

(D)

    src
        A.B.C.hs
        ghc
            A.B.C.M1.hs
        hugs
            A.B.C.M2.hs

I don't think it is unreasonable to want to organize sources in fashion
(C) or (D).

I'm currently using (A) by way of a different (manually implemented)
mechanism thnan VPATH. That took quite some time to get right though,
and even though I'm happy with the mechanism since it is quite general,
using VPATH would have saved me some time. 

Now, as it turns out, GreenCard does not seem to support hierarchical
modules at this point, so in the case of Xlib, it seems as if it's going
to be necessary to make StdDIS a top-level module, with some kind of
prefix naming convention to avoid clashes with other StdDISs.
I.e. something like:

    src
        Graphics
            X
	        <Haskell system independent Haskell & GreenCard sources>
        ghc
            Graphics_X_StdDIS.gc   -- StdDIS for GHC
        hugs
            Graphics_X_StdDIS.gc   -- StdDIS for Hugs

Am I the only one who find this kind of source organization totally
unorgaized?

> It comes down to this: the Haskell system has to find files (normally
> interface files).  Either these files are named according to a simple
> convention, or you have to tell the compiler more about where they live.
> You would rather we moved towards the latter position.

In principle, yes. But there are of course issues if one still want to
use tools like Make. Powerful mechanisms like implicit rules pretty
much assumes fairly straight-forward mappings between names of
generated files and source files.

> There's a compromise (well sort of).  If you don't need to use --make,
> then GHC will let you use whatever filenames you like for the sources;
> the only restriction is that it has to be able to find the interfaces,
> so you have to arrange they get placed where GHC can find them either by
> using the -ohi flag or by having the build system move them.  
> 
> Even if you do need to use --make or GHCi, then I think you can still
> specify all the source files on the command line.  

This might be a possibility, although I do think (and I just spoke to
John Peterson: and he told me I could officially say that he agrees! ;-)
that a mechanims allowing prefixes to be associated with directories in
the search path makes a lot of sense and would be easier.

Right now, I don't know what implications your suggestion would have
for implicit rules in Makefiles, for instance.

Also, what are the
implications once we actually want to INSTALL the interface files?
Doesn't one need to somehow re-create the "fully qualified name" of the
file then?

And what about other tools? I could see how e.g. Hugs in principle could
adopt something like what we have discussed and that that would work in
a similar manner. I'm less sure in this case?

Maybe I should point out that I have no quibbles with conventions like
"A/B/C" or whatever once we talk about installed libraries, be it as part
of a package or a plain import directory hierrachy.

> Actually, I've just tried this and it almost works.  The interface files
> get put in the current directory and have all but the last component of
> the module name stripped off, which is fine as long as you don't have
> any other modules with the same last component.

OK, I'm confused. How does GHC find the modules in this case? When I
tried similar things a while ago, GHC didn't seem to want to find
the interface to a module "A.B.C" in a file "C.hs" in the current working
directory?

> I think perhaps fixing
> this and documenting the behaviour would be a good step forward.  Then
> would there be any reason to need the other extensions?

I agree that this is interesting. But I'm not sure what the implications
are. Is this really simpler than what we were discussing before?

But fundamentally you are right, of course. The whole issue is how to
get the compiler to find the interface files. Assuming a correspondence
bewteen module name and file names isn't such a great idea to begin with
for a number of reasons. And in principle, compiler implemetors should
be free to choose whatever mechanisms/conventions they prefer for this.
The interface files are really part of the compiler's internal mechanims.

/Henrik

-- 
Henrik Nilsson
Yale University
Department of Computer Science
nilsson@cs.yale.edu