[Haskell-cafe] A new cabal odissey: cabal-1.8 breaking its own neck by updating its dependencies

Paolo Giarrusso p.giarrusso at gmail.com
Wed Sep 15 14:21:26 EDT 2010


Hi Duncan,
first, thanks for coming yourself to answer.

On Wed, Sep 15, 2010 at 18:33, Duncan Coutts
<duncan.coutts at googlemail.com> wrote:
> On 13 September 2010 20:54, Paolo Giarrusso <p.giarrusso at gmail.com> wrote:
>> On Sun, Sep 12, 2010 at 20:46, Tillmann Rendel
>> <rendel at mathematik.uni-marburg.de> wrote:

> 1. upgrading packages can break dependencies (and Cabal does not do a
> lot to prevent/avoid this)
>
> 2. cabal ought to allow using multiple versions of a single package in
> more circumstances than it does now

I answer below with some issues - in particular, I discuss why IMHO
your proposal for 2. does not work well with cross-module inlining.

> Both of these issues are known to the Cabal hackers (i.e. me and a few
> other people). I'll share my views on the problem and the solution.

Ah-ah! Can I request to add _at least_ the 1st among FAQs? Something like:
"A version of package A was rebuilt [for an upgrade of its
dependency], and stuff depending on A started causing linking errors!"

I am even ready to send patches.

> 1. This is certainly a problem. The current situation is not awful but
> it is a bit annoying sometimes. We do now accurately track when
> packages get broken by upgrading dependencies so it should not be
> possible to get segfaults by linking incompatible ABIs.

I had a slightly different counterexample, but maybe it's purely a GHC
bug; I use GHC 6.10.4 and the latest Cabal/cabal-install. Are
dependencies computed by Cabal or ghc-pkg? If they are computed by
Cabal, I think I have a bug report.

At some point I unregistered a package with ghc-pkg
(old-locale-1.0.0.2 probably), without using --force, and I started
getting linker errors mentioning it, in a form like:
<command line>: unknown package: old-locale-1.0.0.2
even if old-locale-1.0.0.2 appeared on no command line (not even
internal ones, I checked everything with -v), but was just mentioned
by a package mentioned on the command line of an internal command.

There is a small possibility that this was due to the older Cabal
which was installed with GHC - but IIRC the new cabal was one of the
first packages (or the first) I installed.

> My preferred solution is to follow the example of Nix and use a
> persistent package store. Then installing new packages (which includes
> what people think of as upgrading) become non-destructive operations:
> no existing packages would be broken by an upgrade.

> It would be
> necessary to allow installing multiple instances of the same version
> of a package.
That would solve Cabal bug 738 which I reported.

> If we do not allow multiple instances of a package then breaking
> things during an upgrade will always remain a possibility. We could
> work harder to avoid breaking things, or to try rebuilding things that
> would become broken but it could never be a 100% solution.

It is a good idea, but how do you handle removal requests? Also, there
are existing complete solutions, they are much harder to get right.
However, multiple versions of the same package is a good idea, and in
particular it would make upgrading Cabal much less tricky.
The problem with package removal is still present, but that is less
important than safety (especially given that "cabal uninstall" is
still a TODO); and in a safe persistent system, one can use ghc-pkg
unregister and manually handle the dependencies.
And I'd like to point out that a non-persistent package store can be
made to 100% work - with your proposal it would do so by design.

> 2. This is a problem of information and optimisitic or pesimistic
> assumptions. Technically there is no problem with typechecking or
> linking in the presense of multiple versions of a package. If we have
> a type Foo from package foo-1.0 then that is a different type to Foo
> from package foo-1.1. GHC knows this.

> So if for example a package uses regex or QC privately then other
> parts of the same program (e.g. different libs) can also use different
> versions of the same packages. There are other examples of course
> where types from some common package get used in interfaces (e.g.
> ByteString or Text). In these cases it is essential that the same
> version of the package be used on both sides of the interface
> otherwise we will get a type error because text-0.7:Data.Text.Text
> does not unify with text-0.8:Data.Text.Text.

> The problem for the package manager (i.e. cabal) is knowing which of
> the two above scenarios apply for each dependency and thus whether
> multiple versions of that dependency should be allowed or not.
> Currently cabal does not have any information whatsoever to make that
> distinction so we have to make the conservative assumption. If for
> example we knew that particular dependencies were "private"
> dependencies then we would have enough information to do a better job
> in very many of the common examples.

> My preference here is for adding a new field, build-depends-private
> (or some such similar name) and to encourage packages to distinguish
> between their public/visible dependencies and their private/invisible
> deps.

On a policy level, it's difficult for a developer to keep track of
which dependencies are public and private. You need to manually
inspect your public API.
On a mechanism level, I think that adding a field actually doesn't
work, because GHC cross-module inlining can change the picture
unpredictably: cabal would need to check that packages in
build-depends-private are not mentioned in the .hi interface files -
but GHC can store there implementation details.
Results: if cabal does no checking, a packager can easily shoot the
foot of its users (rather than its own). If cabal does such checking,
getting it right requires trial-and-error for the developer, and it
will cause errors when the GHC version and optimization options
change. We don't want either scenario.

E.g. I just made up a syntax for a regexp library, and built a
function which should check if (useless) trailing spaces are present
in some text:
checkNoTrailingSpace:: String -> String
checkNoTrailingSpace = not . (regexpMatch "\s+$")
allowing inlining of such a function would turn a possibly private
dependency on some regexp package into a public one.

However, automatic checking as I proposed (without extra help from
GHC) does not work either, and I show a counterexample, which is also
about "bad library design".
Suppose that V1 of package Foo has functions:

buildFoo bar baz = (bar, baz)
takeBar (bar,baz) = bar

and that V2 of Foo swaps the order of bar and baz in the underlying
pair. The pair representation should be either encapsulated by a data
constructor (but it is not), or part of the API and ABI and thus not
changeable. Today doing this causes no harm, but if linking multiple
versions of Foo were allowed, this would create a nightmare. If a data
constructor where used, versioned typechecking would catch the
problem.

Since these functions could be fully inlined in module Foo2, it
becomes impossible to infer from .hi files of Foo2 which dependencies
are public, unless GHC stores from which modules come bodies of
functions exposed in .hi files.

So, I propose to:
- depending on the solution, possibly educate library developers about
resulting pitfalls, if they are not supposed to write code like the
above.
- extend GHC to produce needed information (if not done)
- use that for automatically checking which dependencies are public
and which are private (at package installation time)

Best regards
-- 
Paolo Giarrusso - Ph.D. Student
http://www.informatik.uni-marburg.de/~pgiarrusso/


More information about the Haskell-Cafe mailing list