The Haskell Cabal: A Common Architecture for Building Applications and Tools
Prev		Next

2. The Haskell Package System: overview

This section summarises the vocabulary and main features of the Haskell Package System.

2.1. Packages

A package is the unit of distribution for the Cabal. Its purpose in life, when installed, is to make available some Haskell modules for import by some other Haskell program. However, a package may consist of much more than a bunch of Haskell modules: it may also have C source code and header files, documentation, test cases, auxiliary tools and whatnot.

Each package has:

A globally-unique package name, containing no spaces. Chaos will result if two distinct packages with the same name are installed on the same system. How unique package names are handed out is not part of this specification, but there will presumably be some global web site where package authors can go to register a package name.
A version, consisting of a sequence of one or more integers.
A list of explicit dependencies on other packages. These are typically not exact; e.g. "I need hunit version greater than 2.4".
A list of exposed modules. Not all of the modules that comprise a package implementation are necessarily exposed to a package client. The ability to expose some, but not all, of the modules making up a package is rather like using an explicit export list on a Haskell module.

The first two components can be combined to form a single text string called the package ID, using a hyphen to separate the version from the name, and dots to separate the version components. For example, "hunit-2.3".

2.2. Packages and the Haskell language

A complete Haskell program will consist of one or more modules (including Main) compiled against one or more packages (of which the Prelude is one). These packages are not referred to explicitly in the Haskell source; instead, the packages simply populate the hierarchical space of module names.

Complete programs must obey the following invariant. Consider all the Haskell modules that constitute a complete program: no two modules must have the same module name.

This invariant is conservative. It preserves the existing semantics of Haskell, and is relatively easy to implement. In particular, the the full name of an entity (type, class, function), which is used to determine when two entities are the same, is simply a pair of the module name and the entity name.

The invariant is unsatisfactory, however, because it does not support abstraction at the package level. For example, a module with an internal (hidden, non-exposed) module called Foo cannot be used in the same program as another package with an unrelated internal module also called Foo. Nor can a program use two packages, P and Q, which depend on different versions of the same underlying package R. We considered more sophisticated schemes, in which (for example) the package name, or package ID, is implicitly made part of every module name. But (a) there is a big design space, and (b) it places new requirements on the implementations. Hence a conservative starting point.

2.3. Packages and compilers

We use the term ``compiler'' to mean GHC, Hugs, Nhc98, hbc, etc. (Even though Hugs isn't really a compiler, the term is less clumsy than ``Haskell implementation''.)

The Cabal requires that a conforming Haskell compiler is somewhat package aware. In summary, the requirements are these:

Each compiler hc must provide an associated package-management program hc-pkg. A compiler user installs a package by placing the package's supporting files somewhere, and then using hc-pkg to make the compiler aware of the new package. This step is called registering the package with the compiler.
To register a package, hc-pkg takes as input an installed package description (IPD), which describes the installed form of the package in detail. The format of an IPD is given in Section 3.4.
Subsequent invocations of hc will include modules from the new package in the module name space (i.e. visible to import statements).
The compiler should support -package and -hide-package flags for finer-grain control of package visibility.

A complete specification of these requirements is given in Section 3.

2.4. Package distributions

A Cabal package can be distributed in several forms:

A Cabal source distribution is a tree of files (tar-ball, zip file etc) containing the tool's sources, which may need to be compiled before being installed. The same source tarball may well be installable for several Haskell implementations, OSs, and platforms.
A source distribution may contain fewer files than appear in the developer's CVS repository; for example, design notes may be omitted. It may also contain some derived files, that do not appear in the the developer's repository; for example, ones made by a somewhat exotic pre-processor where it seems simpler to ship the derived file than to ensure that all consumers have the pre-processor.
A Cabal binary distribution is a tree of files that contains a pre-compiled tool, ready for installation. The pre-compilation means that the distribution will be Haskell-compiler-specific, and certain "looser" dependencies (hunit > 2.3) will now be precisely fixed (hunit == 2.4).
The package may be wrapped up as an RPM, Debian package, or Windows installer (this list is not exhaustive). In that case, the way it is installed is prescribed by the respective distribution mechanism; the only role of the Cabal is to make it easy to construct such distributions. All three are compiler-specific (indeed compiler-version-specific) binary distributions.

2.5. The Setup script

The key question is this: how should Angela Author present her Cabal package so that her consumers (Bob, Sam, Willie, etc) can conveniently use it?

Answer: she provides a tree of files, with two specific files in the root directory of the tree:

Setup.description contains a short description of the package: specifically, the package name, version, and dependencies. It may also contain further information specific to the particular build system. The syntax of the package description file is given in Section 4.1.
Setup.lhs is an executable Haskell program which conforms to a particular specification, given in detail in Section 4. In summary, though, Setup.lhs allows a consumer to configure, build, test, install, register, and unregister a package.

The Setup script is an interface. It is meant to give a standard look-and-feel to packages for the sake of Joe User, Bob Builder, Peter Packager, Sam Sysadmin, and Rowland RPM, as well as for layered software tools. This interface provides an abstraction layer on top of any implementation that Angela or Marcus prefers.

The Cabal allows a package author to write the setup script in any way she pleases, provided it conforms to the specification of Section 4. However, many Haskell packages consist of little more than a bunch of Haskell modules, and for these the Cabal provides the simple build infrastructure, a Haskell library that does all the work. The simple build infrastructure, which was used for the example in Section 1.2, is described in Section 5.

In principle, the Setup script could be written in any language; so why do we use Haskell?

Haskell runs on all the systems of interest.
Haskell's standard libraries should include a rich set of operating system operations needed for the task. These can abstract-away the differences between systems in a way that is not possible for Make-based tools.
Haskell is a great language for many things, including tasks typically relegated to languages like Python. Building, installing, and managing packages is a perfect proving ground for these tasks, and can help us to discover weaknesses in Haskell or its libraries that prevent it from breaking into this "market". A positive side-effect of this project might be to make Haskell more suitable for "scripting" tasks.
Likewise, each piece of the project (Building, Installing, and Packaging) can be leveraged elsewhere if we make them into libraries.
Make is not particularly good for parsing, processing, and sharing meta-information about packages. The availability of this information to Haskell systems (including compilers, interpreters, and other tools) is useful. Unlike Make, Haskell can also reuse unrelated algorithms, parsers, and other libraries that have been developed in the past.
Dogfooding, the act of using the tools you develop, is a healthy policy.

It is convenient for consumers to execute Setup.lhs directly, thus:

  ./Setup.lhs ...

This can be achieved by starting Setup.lhs with "#! /usr/bin/env runhugs" or "#! /usr/bin/env runghc" . Since it's a literate Haskell script (.lhs file), the Haskell compiler will ignore this line. However, nothing stops a consumer from running the script interactively, or compiling it and running the compiled binary. Any implementation of Haskell should suffice to run the script, provided the implementation has the Cabal libraries installed.

Prev	Home	Next
The Haskell Cabal		What the compilers must implement