Integrating head.hackage with GHC's CI infrastructure

Ben Gamari - 2020-06-11

Hello everyone,

As I mentioned a last year in my infrastructure update, one of the many features that we have gained as a result of our migration to GitLab is the ability to incorporate testing against user code (e.g. from Hackage) into our routine CI testing. In the past year we have been hard at work to make this infrastructure a reality.

In this post we will describe the current state of that infrastructure as well as what you (yes, you!) can do to help us make the most of this infrastructure; no GHC hacking experience is required!

The goal

For a long time we have discussed using our large package repositories (Hackage and Stackage) for testing of GHC snapshots and pre-releases. Specifically, there are three ends which such testing might serve:

correctness testing: By confirming that user code compiles we have better assurance that we understand the full implications of changes made in GHC on our users and that those changes are implemented correctly.
compiler performance testing: By measuring the performance of the compiler as it compilers user code we learn more about the typical cost centers within GHC. While we have dedicated performance testsuites (e.g. nofib) that also serve this purpose, there is plenty of evidence that suggests that the programs in these testsuites are qualitatively different from modern Haskell programs.
runtime performance testing: As with (2), but measuring the performance of the compiled program itself rather than that of GHC.

While these potential benefits are significant, so are the challenges:

changes in GHC and core libraries: Due to the tight coupling between GHC and its core libraries (e.g. base, template-haskell), GHC releases are typically accompanied by library changes which often break user programs. These need to be patched but in a way that allows package authors to respect the PVP.
persistent breakage: Because of the expected breakage mentioned in (1) any package set of non-trivial size will contain at least a few broken packages at any given time. For this reason, in contrast to typical CI pipelines, we want to be notified only when packages’ build state changes: e.g. when a package breakage starts (e.g. due to a breaking change being merged to GHC) but not every subsequent breakage.
changes in user programs: Particularly when tracking performance changes we must take care when updating the tested set of packages: A we are fooled into thinking that a change in a user program is a regression in GHC.

While (2) and (3) are both non-trivial problems, a solution to (1) is close at hand in the form of the head.hackage patch-set.

Patching Hackage for pre-releases

The head.hackage project is a set of patches and associated tools for patching released Hackage packages to build with GHC pre-releases. While head.hackage has been a tool in GHC developers’ toolbox for over a year now, a few considerations has kept it from reaching its potential:

a lack of documentation and a few usability papercuts have limited adoption to a small set of developers.
the lack of integration with GHC’s own continuous integration infrastructure meant that testing of GHC snapshots had to be performed manually
the lack of automated testing of the patchset has precluded scaling the approach to a larger set of packages.

The remainder of this post will discuss our recent work in implementing continuous integration infrastructure to address points (2) and (3). In a future post we will discuss work done to address (1) and walk the user through use of head.hackage to build a real-world package.

Testing infrastructure

Of course, the patch-set is of little value if it is not tested. For this reason we introduced continuous integration infrastructure, allowing us to build the patch-set with both released and pre-released compilers. These two build configurations test somewhat orthogonal notions of correctness:

Testing against GHC releases tests the patch-set, giving us (some) assurance that the patches themselves are correct.
Testing against master (or pre-releases) provides assurance that GHC itself hasn’t regressed.

Happily, this effort has now converged on a usable result, embodied in three merge requests:

ghc/head.hackage!2 adds CI support to head.hackage. In addition to a pass/fail status for the overall build, this job produces (e.g. as seen in this run) a variety of additional products:

a JSON summary of the run, describing the dependency graph and pass/fail state of each package. We can feed this to an external service to track newly-failing packages.
a tarball of build logs, each including statistics from GHC’s -ddump-timings flag. Not only do these logs capture the reason for failure in the case of erroring builds, but they can be scraped for compiler performance metrics, allowing us to track compiler performance on a diverse set of real-world code.

These can be fed to downstream tools, allowing us to better understand and record the evolution of GHC’s performance and correctness.

Making patched packages accessible to users

Our final goal in this effort was to make the patched packages themselves readily accessible to users, allowing users to be easily use GHC’s pre-releases to build larger projects. head.hackage’s continuous integration now produces a Hackage repository, which can be easily used to build existing projects using cabal v2-build’s remote repository support. Use of this repository will be the focus of a future blog post.

Future work

There are a few things that remain to be done:

Work out how to handle tracking of persistent breakage; for instance, we want a responsible party to be notified when a package initially breaks (e.g. when a breaking change is merged to GHC) but not in every subsequent build.
Determine a sustainable means to keep this patch-set up-to-date. Thusfar this has fallen on the shoulders of a few dedicated contributors (thanks Ryan Scott!), but to make this work in the long term we need a more diverse group of maintainers. If this sort of work sounds like something you would be interested in contributing to, please do be in touch!

Furthermore, we might consider making authors of GHC patches which break head.hackage responsible for updating the broken packages, further spreading the maintenance load.
Currently our testing is limited to testing that compilation of the packages does not fail. However, we might also consider extending this to running package testsuites in select cases. This would give us further assurance of correctness, although would likely significantly increase maintenance and computational cost.

Finally, I would like to acknowledge Herbert Valerio Riedel whose vision for the head.hackage patch-set evolved into the infrastructure described above.

Cheers,