Mutually-recursive/cyclic module imports

Sun Aug 17 19:23:41 EDT 2008

Isaac Dupree wrote:
> Duncan Coutts wrote:
>> [...]
>>
>> I'm not saying it's a problem with your proposal, I'd just like it to be
>> taken into account. For example do dependency chasers need to grok just
>> import lines and {-# SOURCE -#} pragmas or do they need to calculate
>> fixpoints.

Actually, good point, Duncan, that got me thinking about 
what we need in order to obviously not to lose much/any of 
the .hs-boot efficiency.  (warning: another long post ahead, 
although the latter half of it is just an example from GHC's 
source) [and I re-read my post and wasn't sure about a few 
things, but maybe better to get feedback first -- please 
tell me if I'm being too verbose somewhere, too]

Let's look at the total imports of a .hs and its .hs-boot, 
as they currently are for GHC.  Either can be non-SOURCE 
imports (let's call them NOSOURCE), SOURCE imports, or not 
importing that.
.hs:NOSOURCE, .hs-boot:NOSOURCE : okay
.hs:NOSOURCE, .hs-boot:SOURCE : okay
.hs:NOSOURCE, .hs-boot:not-imported : okay
.hs:SOURCE, .hs-boot:NOSOURCE : bad, if the .hs needs 
SOURCE, then probably so does the .hs-boot
.hs:SOURCE, .hs-boot:SOURCE : okay
.hs:SOURCE, .hs-boot:not-imported : okay
- the .hs-boot importing a module that the .hs doesn't is 
invalid, or at least useless [actually, see later example -- 
there may be reasons for this, but in that case, it doesn't 
hurt to also import the module in the .hs (assuming there's 
no syntactic/maintenance burden), and it provides better 
automatic error-checking to do so]

Given the limited amount of information a .hs-boot file (or 
SOURCE-imported file, in my scheme) needs for being a 
boot-file, there is no advantage to import the modules it 
depends on as NOSOURCE.  The compiler just has to be clever 
enough to ignore imports of functions that it can't find out 
the type of.  Also, currently using SOURCE requires the 
imported module to have a .hs-boot.  But it should work fine 
to look for a .hi and use that in the absence of .hi-boot, 
because it has strictly a superset of the information (so 
that my statement that "SOURCE is superior to NOSOURCE when 
it works" can be truer, for the sake of demonstration). 
[oops! I was wrong, it may need to NOSOURCE-import on 
occasion to find out a function's type - more on that in a 
later post?]

Now, since the .hs-boot SOURCE vs NOSOURCE has been 
collapsed, I think we can move mostly-all .hs-boot info into 
the .hs file.  If the .hs-boot file had imported something, 
the corresponding import in the .hs is imported with 
{-#SOURCE_FOLLOW#-} (in addition to {-#SOURCE#-} or 
{-#NOSOURCE#-}); otherwise it's imported with 
{-#SOURCE_NOFOLLOW#-} (ditto).  For demonstration, I'll 
assume that all imports are annotated this way, with two 
bits of information.  Presumably all imports that aren't 
part of an import loop are NOSOURCE (which includes all 
cross-package imports).

Now let's look at the dependency chaser.
NOSOURCE imports must not form a loop.  They form dependency 
chains as normal.
SOURCE imports depend on either a .hi or a .hi-boot for the 
imported module.
When a X.hi-boot is demanded:
only SOURCE_FOLLOW imports are dependency-chased from X.hs, 
through any .hs modules that don't already have a .hi or 
.hi-boot.
In the case where .hs-boots worked, this *can* avoid cycles. 
  If this SOURCE_FOLLOW dependency DAG doesn't have any 
cycles, then it should be as simple as calling (the 
fictional) `ghc -source X.hs` to produce X.hi.  If there are 
cycles, and it is sometimes necessary*, GHC needs to be 
slightly smarter and be able to produce all the .hi-boot 
files at once from any graph SCCs (loops) that prevent it 
from being a DAG (e.g., `ghc -source X.hs Y.hs` to produce 
X.hi-boot and Y.hi-boot).  Note that it doesn't need to be 
particularly smart here -- e.g., no type inference is done.

*necessary loops:
example 1, the data/declarations literally loop:
module X1 where
{ import Y1(Y); data X a = End a | Both Y Y; }
module Y1 where
{ import X1(X); data Y = Only (X (Maybe Y)); }
(or kind annotations could be required for these loops in 
general, e.g. data X (a :: *) = ...)
[hmm, in this case actually all we need is the data 
left-hand-side, so we could do this in two stages.  But that 
wouldn't work out so well if their RHSs contained 
{-#UNPACK#-}!SomeNewtypeForInt where SomeNewtypeForInt was 
from the other module.  But that's an optimization that it 
might be okay not to do, as long as it was consistently not 
done both for .hi-boot and .hi/.o; and it could perhaps be 
doable]

example 2, there are just too many back-and-forths:
module X2 where
{ import Y2(Yb); data Xa = Xa; data Xc = Xc Yb; }
module Y2 where
{ import X2(Xa,Xc); data Yb = Yb Xa; data Yd = Yd Xc; }
This second one "could" also be accomplished if multiple 
different .hs-boots were allowed per .hs,
although it doesn't seem worth the annotation!! such as 
using SOURCE_FOLLOW[0] or [1], [2]...
I'm not even going to try to write that! [oh wait, 
SOURCE[0->1] = SOURCE, SOURCE[1->1] = SOURCE_FOLLOW, 
SOURCE[1->null] = SOURCE_NOFOLLOW, maybe something can be 
done like that, more complicated in one way but perhaps a 
bit sounder in another]

Now, SOURCE_NOFOLLOW is a bit of a hack, for a couple reasons:
- instances (especially orphans, and especially overlapping 
instances) may not always be imported when they should be.
- There may be some information that could be 
SOURCE-imported from the module if all its imports were 
SOURCE_FOLLOW, but not enough information was imported that 
way due solely to SOURCE_NOFOLLOW.  That's probably okay 
though; after all, the presence/absence of explicit type 
signatures should have the same effect.  Any information 
that the shallow -hi-boot-making search can't figure out, 
just doesn't go into the .hi-boot (possibly leading to 
erroring later... perhaps the .hi-boot could store info 
saying which information existed but it couldn't figure out, 
to enhance error messages if that info is ever demanded.)

Obviously, Template Haskell can only be run if 
NOSOURCE-imported from another module.

The stupid dependency chaser (the most complicated thing it 
does besides parsing import statements is computing graph 
SCCs) will, of course, still find a few more changed 
dependencies than really need to be recompiled; as always, 
this is where GHC's fancier recompilation checking will come 
into effect.

Some compilers might benefit from (require?) explicit import 
or export lists in some places... also I wonder if perhaps 
items in export lists should be markable as whether they're 
exported to SOURCE-importers (although it doesn't seem 
necessary)

Obviously, annotating every import with both [NO]SOURCE and 
SOURCE_[NO]FOLLOW is unreasonable!  So let's look at 
inferring them.
Any import that's not explicitly annotated with [NO]SOURCE 
can default to NOSOURCE if it's not part of an import-loop, 
or SOURCE if it is.  NOSOURCE is allowed as a pragma here as 
well as SOURCE.  That is, the dependency chaser assumes 
NOSOURCE, and if it finds a loop of imports that aren't 
explicitly SOURCE imports, it converts all that aren't 
*explicitly* NOSOURCE into SOURCE imports (if they're all 
explicitly NOSOURCE, it's an error).
Since SOURCE_NOFOLLOW can easily break things, it really 
shouldn't be the default. (And there's never any need of it 
for imports of modules that aren't part of the current 
module cycle).  However, we really don't want to have to 
specify it on all imports within the loop -- .hs-boots 
manage to only specify for modules that *are* needed.  I 
suggest that SOURCE_[NO]FOLLOW be allowed as a top-level 
pragma that says all (following?) imports are annotated with 
that it they're not explicitly annotated SOURCE_[NO]FOLLOW. 
  For example, let's take some random file from GHC's source 
that has a [l]hs-boot file: compiler/deSugar/DsExpr

current lhs-boot:
\begin{code}
module DsExpr where
import HsSyn    ( HsExpr, LHsExpr, HsLocalBinds )
import Var      ( Id )
import DsMonad  ( DsM )
import CoreSyn  ( CoreExpr )

dsExpr  :: HsExpr  Id -> DsM CoreExpr
dsLExpr :: LHsExpr Id -> DsM CoreExpr
dsLocalBinds :: HsLocalBinds Id -> CoreExpr -> DsM CoreExpr
\end{code}

current lhs: lots of imports, it will become obvious

proposed new lhs, just like old lhs but with a few pragmas 
inserted:
\begin{code}
-- ...
module DsExpr ( dsExpr, dsLExpr, dsLocalBinds, dsValBinds, 
dsLit ) where
{-# SOURCE_NOFOLLOW #-}
#include "HsVersions.h"

import Match
import MatchLit
import DsBinds
import DsGRHSs
import DsListComp
import DsUtils
import DsArrows
import {-# SOURCE_FOLLOW #-} DsMonad
import Name

#ifdef GHCI
import PrelNames
         -- Template Haskell stuff iff bootstrapped
import DsMeta
#endif

import {-# SOURCE_FOLLOW #-} HsSyn
import TcHsSyn

-- NB: The desugarer, which straddles the source and Core 
worlds, sometimes
--     needs to see source types
import TcType
import Type
import {-# SOURCE_FOLLOW #-} CoreSyn
import CoreUtils

import DynFlags
import CostCentre
-- hmm, actually Var was not imported by the lhs,
-- only Id (which imports Var) !  It looks okay to
-- just annotate the Id import here:
import {-# SOURCE_FOLLOW #-} Id
-- Are there times where this would ever
-- be a terrible problem?  Well, we could have
-- added a line
--import {-# SOURCE_FOLLOW #-} Var ( Id )
-- instead, which would not hurt much.
-- (if Var.Id were a different type than Id.Id,
-- compiling this DsExpr module would give a
-- simple ambiguity error, no risk of
-- hs vs. hs-boot inconsistency)
import PrelInfo
import DataCon
import TysWiredIn
import BasicTypes
import PrelNames
import SrcLoc
import Util
import Bag
import Outputable
import FastString
\end{code}
...

-Isaac