Contributing to GHC

simonpj - 2016-06-20

This post is a response to Anthony’s blog post about contributing to GHC, and the subsequent Reddit discussion. You’ll find it easier to follow my comments below if you read Anthony’s post first. Short summary: many of Anthony’s criticisms are accurate, but they are not easy to address, especially in a volunteer-only project. However we will do something about the feature-request process.

Barrier to entry

I am ashamed that GHC seems difficult to contribute to. The details don’t matter; the fact that it made you feel that way is by definition bad. I’m really sorry. We should find a way to do better.

An underlying issue is one of effort budget. GHC has essentially no full-timers, unlike many open-source projects. In particular, Simon Marlow and I are volunteers like everyone else, and like everyone else we have a day job. Microsoft Research generously pays Well Typed for front-line support, manifested in the heroic form of Ben and Austin, totalling around one FTE. But their effort is fully absorbed by managing releases, reviewing patches, maintaining the infrastructure, and so on.

It’s a real challenge to maintain a low barrier to entry for a large complex project, whose motive force is primarily volunteers. It means that any initiative will only fly if someone steps up to drive it.

Technical documentation

The questions of scattered and inconsistent documentation are difficult to address. GHC is twenty-five years old; it has hundreds of authors and documentation that was once accurate falls out of date. I would love it to have consistent, well-explained documentation. But I don’t know how to achieve it.

GHC’s technical documentation is either in the code itself, or on GHC’s wiki. An advantage of a wiki is that anyone can edit it. Yes, instructions and technical descriptions go out of date. Who will fix them? You, gentle reader! There is no one else.

But in practice few do. Perhaps that’s because such work is invisible, so no one even knows you’ve done it? What would make people feel “I’d really like to improve those instructions or that documentation?”

I will argue for two things. First, I find Notes incredibly helpful. They live with the code, so they are less likely to go out of date. They don’t mess up the flow of the code. They can be referred to from multiple places. They are the single most successful code documentation mechanism we have. (To be fair to Anthony, I don’t think was complaining about Notes, just expressing surprise.)

Second, I really do think that each new proposed feature needs a description, somewhere, of what it is, why it’s a good thing, and as precisely as possible what its specification is. Plus perhaps some implementation notes. It’s very important when reading an implementation to know what it is that the code is seeking to implement. So yes, I do frequently ask for a specification.

Arc, github, phabricator, etc

Personally I carry no torch for arc. I do whatever I’m told, workflow-wise. If GitHub has got a lot better since we took the Arc decision, and there’s a consensus that we should shift, I’m all for it provided someone explains to me what my new workflows should be. I am absolutely in favour of reducing barriers to contribution. (Other members of the GHC team have much stronger and better informed views than me, though.)

But there are costs to moving, especially if the move implies friction in the future. In particular, those that worry me most are those surrounding issue tracking. Github’s issue tracker simply doesn’t seem sufficient for a project as large and multi-faceted as GHC; in particular, tags as the only form of metadata is extremely limiting.

One alternative would be to use Github and Trac side-by-side, but then we face the issue of conflicting ticket number spaces as both systems insist on ticket numbers of the form #NNNN.

Coding style

Coding style is rather a personal thing, and we have been reluctant to enforce a single uniform style. (Quite apart from the task of reformatting 150k lines of Haskell.) Personally I don’t find it an obstacle to reading other people’s code.

Feature requests

That leaves the most substantial issue that Anthony poses: the process of making a contribution.

For fixing a bug, I think that (aside from the arc/Github debate) things are not too bad. On the arc/Github question, Austin has been working with the Phabricator maintainers to try to have them introduce a workflow which might be more familiar and more convenient for Github experts.

But for offering a new feature, Anthony argues that the process is unacceptably arduous. There are lots of things to think about

GHC has tons of features implemented by talented and motivated folk… who have since moved on. So when a new feature is proposed, my baseline guess is that I will personally be responsible for maintaining it in five years time. So I want to understand what the feature is. I want to understand how the implementation works. I want to be reasonably sure that it doesn’t add a bunch of complexity to an already very complicated code base. And since any new feature adds some complexity, I want to have some confidence that the feature commands broad support – even when it’s behind a language extension flag.

So actually I think it’s reasonable that the process should be somewhat arduous. A new feature imposes costs on every single person who works on that code in the future. We don’t really make this point explicitly, but the contributor guidelines for Phabricator do. It might be helpful if we articulated similar guidelines.
“Any proposal needs a lightweight way to gauge broad support, then a period of constructive refinement”. Indeed! What would be such a lightweight way? The trouble is that there is an enormous silent majority, and discussions are typically driven by a handful of articulate and well-informed contributors. All credit to them, but they may very well be an unrepresentative sample. I simply don’t know how to gauge true broad support.
There is a problem with my own personal lack of bandwidth. I am one of the main gatekeepers for some big chunks of GHC, the renamer, typechecker and optimisation passes. That is good in a way, because if GHC lacked gatekeepers, it would soon lose conceptual integrity and become a huge ball of mud. But it is bad in other ways. I review a lot of code; but not fast enough! In prioritising I am guided by my (doubtless flawed) perceptions of things that lots of people are eagerly awaiting. The same thing goes for Simon Marlow, especially in the runtime system. We both have other day jobs. Even writing this post means that I’m not reviewing someone’s code.

But I am acutely aware that “Simon and Simon are busy” is pretty cold comfort to someone awaiting a review. Maybe there should be a bunch of other people playing this role. That would be great. For example, Richard Eisenberg has taken responsibility for Template Haskell, which is totally fantastic. I would love to hear from people who are willing to take overall responsibility for a piece of GHC.

None of this is an argument for the status quo. Simon, Ben, Austin, Herbert, and I have been talking about a rather more systematic process for new features. We’d like to learn from experience elsewhere, rather than reinvent the wheel, such as the Rust process. Please suggest processes that you have seen working well elsewhere.