planning for ghc-6.10.1 and hackage [or: combining packages to yield new type correct programs]

Thu Oct 2 07:41:38 EDT 2008

Don Stewart wrote:

> Here's a summary of why this is non-trivial,
> 
> * We're trying to compose packages on the users machine to yield new
>   type correct programs.
> 
> * We're using cabal dependencies to decide when it is safe to do this.
>   Hopefully we don't rule in any type incorrect combinations, nor rule
>   out to many type correct combinations.
> 
> * This is scary - in fact, we think the package system admits type
>   incorrect programs (based on Typeable, or initialised global state), 
>   as it is similar to the runtime linking problem for modules.

I think what you're referring to is the problem that occurs if the program 
links in more than one copy of Data.Typeable, which would then invalidate 
the assumptions that make Data.Typeable's use of unsafeCoerce safe.  I 
wouldn't call this "type-incorrect" - it's a violation of assumptions made 
by Data.Typeable, not type-incorrectness in the Haskell sense.

But you'll be glad to know this doesn't happen anyway, because 
Data.Typeable's state is held by the RTS these days, for exactly this reason.

However, there are libraries which do have private state (e.g. 
System.Random).  We'd prefer not to have more than one copy of the state, 
but it's not usually fatal: in the case of System.Random, different clients 
might get streams of random numbers initialised from different seeds, but 
that's indistinguishable from sharing a single stream of random numbers. 
Often this global-state stuff is for caching, which works just fine when 
multiple clients use different versions of the library - it's just a bit 
less efficient.

> * We use constraint solving determine when composition is safe, by looking at 
>         "package >= 3 && < 4"
>   style constraints. That is, we try to guess when the composition
>   would yield a type correct program.

The way to make this completely safe is to ensure that the resulting 
program only has one of each module, rather than one of each package 
version - that's an approximation, because the package name might have 
changed too.

Now, we want to relax this in various ways.  One way is the base-3/base-4 
situation, where base-3 has a lot of the same modules as base-4, but all 
they do is re-export stuff from other packages.  How do we know this is 
safe?  Well, we don't - the only way is to check whether the resulting 
program typechecks.

Another way we want to relax is it when a dependency is "private" to a 
package; that is, the package API is completely independent of the 
dependency, and hence changing the dependency cannot cause compilation 
failure elsewhere.  We've talked in the past about how it would be nice to 
distinguish private from non-private dependencies.

Let's be clear: there are only two ways that something could "go wrong" 
when composing packages:

  1. the composition is not type-correct; you get a compile-time error

  2. some top-level state is duplicated; if the programmer has been
     careful in their use of unsafePerformIO, then typically this won't
     lead to a run-time error.

So it's highly unlikely you end up with a program that goes wrong at 
runtime, and in those cases arguably the library developer has made 
incorrect assumptions about unsafePerformIO.

> * Again, we're using constraint solving on this language to
>   determine when composition of Haskell module sets (aka packages) would
>   yield type correct Haskell programs
>   
>   All without attempting to do type checking of the interfaces between
>   packages -- the very thing that says whether this is sound!

True - but we already know that package/version pairs are a proxy for 
interfaces, and subject to user failure.  If the package says that it 
compiles against a given package/version pair, there's no guarantee that it 
actually does, that's up to the package author to ensure.  Now obviously 
we'd like something more robust here, but that's a separate problem - not 
an unimportant one, but separate from the issue of how to make 
cabal-install work with GHC 6.10.1.

cabal-install has to start from the assumption that all the dependencies 
are correct.  Then it can safely construct a complete program by combining 
all the constraints, and additionally ensuring that the combination has no 
more than one of each module (and possibly relaxing this restriction when 
we know it is safe to do so).

> * So, the solver for cabal-install has to be updated to allow the same 
>   package to have multiple, conflicting versions, as long as version X
>   depends on version Y, and then not reject programs that produce this
>   constraint.

Right.

> * This is non trivial, but think this refactoring is possible, but it is
>   hard. ultimately we're still making optimistic assumptions about 
>   when module sets can be combined to produce type correct programs, 
>   and conservative assumptions, at the same time.
> 
>   What we need is a semantics for packages, that in turn uses a
>   semantics for modules, that explains interfaces in terms of types.

The semantics is quite straightforward: a module is identified by the pair 
(package-id, module name), and then you just use the Haskell module system 
semantics.  That is, replace all module names in the program with 
(package-id, module name) pairs according to which packages are in scope in 
the context of each module, and then proceed to interpret the program as in 
Haskell 98.

The main problem with looking at things this way is that you need to see 
the whole program - which is what I've been arguing against in the context 
of instances.  So I agree that looking for a semantics for packages that 
lets you treat them as an abstract entity would be useful.  Still, the 
above interpretation of packages is a good starting point, because it tells 
you whether a higher-level semantics is really equivalent.

> * The end result is that cabal-install should be able to find automated
>   install plans for packages that ask for base-3, even when base-4 is on
>   the system as well, and it uses pieces of base-3 libraries and base-4
>   libraries. Some more programs will work than if we didn't ship base-3.

So I'm not sure exactly how cabal-install works now, but I imagine you 
could search for a solution with a backtracking algorithm, and prune 
solutions that involve multiple versions of the same package, unless those 
two versions are allowed to co-exist (e.g. base-3/base-4).  If backtracking 
turns out to be too expensive, then maybe more heavyweight 
constraint-solving would be needed, but I'd try the simple way first.

What happens with automatic flag assignments?  Presumably we can decide 
what the flag assignment for each package is up-front?

Cheers,
	Simon