GHC Infrastructure Update

Ben Gamari - 2019-04-03

Around five months ago I proposed that we undertake what became a comprehensive rebuild of GHC’s infrastructure. Since November we have been quietly working behind the scenes to make this new infrastructure a reality; this has been a massive project and however I’m happy to say we are now emerging on the other side and we are very happy with the result.

In this post I want to take this opportunity to describe why this project was needed, what it has entailed, and where it has brought us. Enjoy!

Motivation

Most of our users are aware that GHC is an old project: the earliest release I have found is GHC 0.29, released nearly 23 years ago.

In addition to being old, GHC is also a large project and, like most large projects, has a significant amount of infrastructure to keep things moving smoothly. Before the migation this infrastructure included:

  • gitolite, our repository hosting service (git.haskell.org)
  • home-grown infrastructure for maintaining git mirrors to and from GitHub our homepage (haskell.org/ghc)
  • Trac, our issue tracker and wiki (ghc.haskell.org/trac/ghc)
  • a number of home-grown linting scripts for ensuring code quality and preventing mistakes
  • Phabricator, our code review tool (phabricator.haskell.org)
  • our continuous integration services (Phabricator and more recently CircleCI/Appveyor)

While this all worked reasonably well it was not without its share of pain-points. These issues ranged from minor (e.g. a constant trickle of effort going towards maintaining consistency between Phabricator, git, and Trac) to serious (e.g. our servers stuck on a soon-to-be deprecated Debian release; a need for constant fiddling to keep CI builders running). Nevertheless, none of these issues seemed significant enough to force any immediate change.

This calculation changed around twelve months ago, when we received word that Rackspace would be ending its open-source software program, which graciously provided hosting for our servers for the last six years (thank you, Rackspace!).

Our first inclination was to simply rebuild GHC’s existing services on a new hosting provider. However, as I began this process in the summer of 2018 the scale of the challenge became apparent. Not only was our primary server an organic mix of scripts and configuration with varying degrees of documentation, but some of our services ranged from unsupported (e.g. Phabricator, which no longer supported non-paying customers), to infrequently maintained (e.g. many of the plugins used by our Trac instance have not had a release in four years or more), to obsolete (e.g. we still used gitolite v2, which has been deprecated for years),

In light of this, it seemed clear that simply rebuilding our existing infrastructure would prove to be a mistake: despite requiring a significant investment in resources, the rebuild would solve none of the friction that our existing infrastructure incurred and would likely once again devolve into an difficult-to-maintain jungle of configuration. If there was ever a time to change it was now.

However, to complicate matters we were already deep in a rebuild of our continuous integration infrastructure, building upon CircleCI and Appveyor, to support our new six-month release cadence. While this change was in many ways a great improvement over our previous CI infrastructure, it also introduced a number of integration challenges (i.e. tying build results back to Phabricator) that we were still in the process of trying to solve.

Planning the migration

To replace our infrastructure there were two serious contenders: GitHub and GitLab. Happily, both options would move us towards a more git-centric contribution workflow, addressing one of the greatest concerns that potential contributors expressed in our development priorities survey last fall.

While other projects (namely Rust) have demonstrated that it is possible to maintain a large-scale open-source project on GitHub, it was far from clear that GHC could pull this off with our comparatively limited resources and significantly larger legacy migration needs (e.g. we concluded early on that any migration must faithfully preserve GHC’s ticket history, including ticket numbers).

In addition, GitLab was been sucessfully adopted by GNOME, and freedesktop.org, with KDE’s making motions towards a migration as well.

For these and other reasons that are well-covered elsewhere GitLab seemed like a better fit for GHC’s needs.

By early November 2018 there was consensus to move ahead with a migration to move GHC to GitLab. To keep the migration managable, we carefully limited the project’s scope to migrate code review, repository hosting, ticket tracking, and the wiki, leaving any continuous integration migration for future work.

Starting work

In mid-November we started work on the migration as two parallel efforts:

  • phase 1: migrating repository hosting and code review
  • phase 2: migrating ticket tracking and the wiki

Phase 1 was intended to be a small project, allowing us to migrate quickly to our new code review platform and begin the process of decommissioning our old systems.

Phase 2 was significantly riskier, involving the migration of 16,000 tickets, carrying over 100,000 comments of human-written markup. Thankfully, we had the benefit of being able to build upon Matthew Pickering’s previous prototype infrastructure for migrating Trac tickets to Maniphest, Phabricator’s issue tracker.

Phase 1 and scope creep

It is sometimes said that no plan survives first contact with reality; GHC’s migration plan was no exception. In early December 2018 we were notified that imminent changes to CircleCI’s pricing model would essentially preclude further use of the platform. While GitLab provided us with a convenient alternative to CircleCI, this development significantly enlarged the previously carefully-bounded scope of our migration plan.

While CircleCI generously provided us with a two-week extension to our (free) CI plan, the weeks that followed were a scramble to rebuild our CI infrastructure before support vanished.

Regardless, by late December we had finished phase 1 of the migration, including rudimentary CI support, and designated https://gitlab.haskell.org/ghc/ghc as GHC’s official upstream source repository.

Throughout this process we were lucky to have the support of GitLab’s director of community relations, David Planella. David has been an invaluable resource, helping us plan our migration and quickly draw attention to the occassional bug report.

Moreover, GitLab was remarkably responsive to the pain-points that we encountered in our workflow. While many examples can be found via the GHC migration tracking ticket, one in particular stands out: soon after moving code review to GitLab we quickly found that our use of the “fast-forward only” merge strategy (necessary to preserve bisectability), coupled with our long six-hour CI build times, resulted in a very poor patch merge workflow. While we adopted marge-bot as a near-term workaround, David and James of GitLab were happy to hear out and reflect on our use-case, using our experience to design the merge train feature that will be available in a coming GitLab release.

Wiki and ticket migration

Phase 2 of the migration involved migrating GHC’s tickets and nearly wiki pages to GitLab. For the former we used a relatively straightforward parser of Trac’s markup syntax and a simple pretty-printer.

While this worked well enough for ticket descriptions and comments, its performance on Wiki pages was unacceptably poor due to syntactic ambiguity and the significantly richer markup used in wiki pages. To handle this we augmented our conversion script with an HTML parser to back out Markdown from rendered Trac wiki pages. While inelegant, this approach was significantly better in preserving the rich markup found in many wiki pages.

After several dry-run migrations in Janary and February the conversion quality was deemed acceptable by early March, with the final migration being carried out on 9 March 2019. This was a bit later than our goal of finishing the final import by mid-February, but this wasn’t surprising. given the scale of the task.

Improving CI coverage

Testing the many configurations supported by GHC has been a challenge on each of the CI platforms we have used. Most recently, while CircleCI offered many benefits, it complicated testing of non-x86-64/Linux targets due to the platform’s limited operating system support, build time limits, and the high cost of build time. In this respect the move to GitLab opened up a number of opportunities.

Throughout February and March 2019 we focused on extending the CI infrastructure we built in phase 1 to cover these non-standard configurations. As of today, every GHC merge request is tested on over a dozen operating system/architecture/build-parameter configurations, with binary distribution artifacts produced and archived for most of these.

In addition to plain GHC builds, we have also realized the long-sought goal of regularly testing GHC snapshots against user code. For this we have incorporated Herbert Valerio Riedel’s head.hackage patch-set with Matthew Pickering’s ghc-artefact-nix nix expression. In conjunction with some glue logic this combination gives us the ability to test CI-produced binary distributions against a several dozen Hackage packages, with more to be added in the future.

In addition to testing for regressions in correctness, our head.hackage CI records compile-time and allocation metrics, allowing us to track compiler performance on real-world Haskell code. We hope that this will provide better insight into GHC’s compile-time costs and expect to use this insight in future work on improving compiler performance.

Further automation

One of the major motivations for the move to GitLab was the promise of consolidating and automating more project management tasks, removing human bottlenecks and increasing GHC’s bus factor. Towards this end we have automated many of the more mundane aspects of GHC’s infrastructure:

  • CI-triggered builds of the Docker images on which our CI processes are built
  • automated deployment of GHC documentation snapshots
  • automated generation and deployment of GHC’s website (including this blog)
  • well-documented, version-controlled, and maintainable configuration for our servers and CI runners built upon NixOS

While each of these investments is small, we hope that making them now will pay dividends in more development time to work on GHC itself in the years to come.

Documenting the development process

The move to GitLab has meant that we redesign many of the conventions and protocols used in the course of GHC’s development. In this process we have taken the opportunity to more coherently document these conventions. From GHC’s Working Conventions page contributors will now find links the comprehensive documentation describing GHC’s ticket triage and code review protocols.

We have also rewritten our newcomer’s documentation, making it easier for someone new to GHC development to get from forking the compiler to submitting a patch.

What remains to be done

While the dust from the migration has started to settle, there is still plenty to be done. While the wiki import is done, there is still a great deal of cleanup that remains. If you have a few idle minutes feel free to browse around looking for import mistakes and spurious Trac references.

There is also much to be done to further improve the state of GHC’s continuous integration jobs. From fixing broken tests on Windows, via contributing a FreeBSD builder, to making the head.hackage job output more legible, there are plenty of ways in which we appreciate helps.

Additionally for the 8.10 release cycle we would like to greatly increase the size of the package set tested by head.hackage and automate the publication of the head.hackage.org package repository. This will allow all users to easily test their packages against GHC snapshots and prereleases and will further shrink GHC’s development feedback cycle.

Finally, one of the casualties of the GitLab migration has been the 8.8 release schedule, which was originally slated to culminate with the 8.8.1 in mid-March. However, this is a topic I will leave to discuss in another blog post.

Closing thoughts

Needless to say, the last few months have been a whirlwind. However, we think that the result is quite exciting. Not only is GHC’s test infrastructure both more reliable and thorough than it has ever been, but the tools with which the project is developed, released, and maintained are more sustainable and far more inviting to newcomers than they were only six short months ago.

If you are interested in contributing to GHC, but so far have been intimidated by our tools, we encourage you to give it another go. Start from the newcomer’s guide, browse our list of newcomer-friendly tickets, and pick something that suits your skills. As always, if you ever get stuck or find some documentation that is unclear, just ping us on #ghc on Freenode or on ghc-devs.

Acknowledgements

This migration would not have been possible without people both inside and outside of the Haskell community:

  • Matthew Pickering for his help in configuring and maintaining our GitLab instance, including many thankless hours debugging marge-bot
  • Takenobu Tani for his attention to detail in spotting and fixing issues with GHC’s wiki and documentation both before and after the migration
  • Tobias Dammers for his work on the import script and help cleaning up the wiki
  • Packet for generously offering their excellent hosting services
  • Google X, Serokell, and Packet for their sponsorship of our CI infrastructure
  • Carter Schonwald for his work in looking after our macOS builder
  • Davean Scies and Futureice for their donations of GHC’s macOS builders
  • The members of the GHC devops group for their consideration and feedback over the course of the migration.
  • David Planella and everyone at GitLab for their help executing the migration
  • All of GHC’s users and contributors for their patience while we worked our way through this migration