Libraries in the repo

Thu Aug 27 06:26:59 EDT 2009

On 27/08/2009 00:55, Don Stewart wrote:
> marlowsd:
>> Simon and I have been chatting about how we accommodate libraries in the
>> GHC repository.  After previous discussion on this list, GHC has been
>> gradually migrating towards having snapshots of libraries kept as
>> tarballs in the repo (currently only "time" falls into this category),
>> but I don't think we really evaluated the alternatives properly.  Here's
>> an attempt to do that, and to my mind the outcome is different: we
>> really want to stick to having all libraries as separate repositories.
>>
>> Background:
>>   * Scope: libraries that are needed to build GHC itself (aka "boot
>>     libraries")
>>
>>   * Boot libraries are of several kinds:
>>     - INDEPENDENT: Independently maintained (e.g. time, haskeline)
>>     - COUPLED: Tightly coupled to GHC, but used by others (base)
>>     - SPECIFIC: Totally specific to GHC (e.g. template-haskell, DPH)
>>
>>   * Most boot libraries are INDEPENDENT.  INDEPENDENT libraries have a
>>     master repository somewhere separate from the GHC repositories.
>>
>>   * We need a branch of INDEPENDENT libraries, so that GHC builds don't
>>     break when the upstream package is modified.
>>
>>   * Sometimes we want to make local modifications to INDEPENDENT
>>     libraries:
>>       - when GHC adds a new warning, we need to fix instances of the
>>         warning in the library to keep the GHC build warning-free.
>>       - to check that the changes work, before pushing upstream
>>
>>
>> Choices for how we deal with libraries in the GHC repository: (+) is a
>> pro, (-) is a con.
>>
>>    (1) Check out the library from a separate repo, using the darcs-all
>>        script.  The repo may either be a GHC-specific branch
>>        [INDEPENDENT], or the master copy of the package
>>        [SPECIFIC/COUPLED].
>>
>>        (+) we can treat every library this way, which gives a
>>            consistent story.  Consistency is good for developers.
>>        (+) [INDEPENDENT] makes it easy to push changes upstream and sync
>>            with the upstream repo (unless upstream is using a different
>>            VCS).
>>
>>        (-) [INDEPENDENT] we have to be careful not to let our branches
>>            get too far out of sync with upstream, and we must
>>            sync before releasing GHC.
>>
>>    (2) Put a snapshot tarball of the library in libraries/tarballs,
>>        but allow you to checkout the darcs repo instead.
>>
>>        (-) [SPECIFIC/COUPLED] this approach doesn't really make sense,
>>            because we expect to be modifying the library often.
>>        (-) updating the snapshot is awkward
>>        (-) workflow for making a change to the library is awkward:
>>            - checkout the darcs repo
>>            - make the change, validate it
>>            - push the change upstream (bump version?)
>>            - make a new snapshot tarball
>>            - commit the new snapshot to the GHC repo.
>>        (-) having tarballs in the repository is ugly
>>        (-) we have no revision history of the library
>>
>>    (3) The GHC repo *itself* contains every library unpacked in the
>>        tree.  You are allowed to check out the darcs repo instead.
>>
>>        (+) atomic commits to both the library and GHC.
>>        (+) doing this consistently would allow us to remove darcs-all,
>>            giving a nice easy development workflow
>>
>>        (-) [INDEPENDENT/COUPLED] still need a separate darcs repo.
>>        (-) [INDEPENDENT/COUPLED] pushing changes upstream is hard
>>        (-) [INDEPENDENT/COUPLED] manual syncing with upstream
>>        (-) [COUPLED] (particularly base) syncing with
>>            upstream would be painful.
>>
>>
>> (3) works best for SPECIFIC libraries, whereas (1) works best for
>> INDEPENDENT/COUPLED libraries.  If we want to treat all libraries the
>> same, then the only real option is (1).
>>
>> Experience with Cabal and bytestring has shown that (1) can work for
>> INDPENDENT libraries, but only if we're careful not to get too
>> out-of-sync (as we did with bytestring).  In the case of Cabal, we never
>> have local changes in our branch that aren't in Cabal HEAD, and that
>> works well.
>>
>> Comments/thoughts?
>
>
> As author of bytestring, I'd prefer it if GHC used a released version
> direct from Hackage. I.e. GHC could snapshot a Hackage release, and get
> out of the business of cloning repos. Same for other INDPENDENTs.

Are you saying you don't want us to have a GHC branch?  Even if the 
branch just pulls from upstream and never has local changes?  We can 
still use released versions only, the main point about having separate 
repos is that we have a consistent picture of libraries from GHC's side.

For bytestring I imagine we can get away without making changes between 
releases, or at least ensuring our changes are sent upstream and we wait 
for a release before pulling.  For other libraries, such as Cabal, this 
would be too onerous I think (Cabal is really COUPLED at the moment, 
much as we'd like it to be INDEPENDENT).

Cheers,
	Simon