how to checkout proper submodules

Austin Seipp aseipp at pobox.com
Wed Jun 5 04:05:58 CEST 2013


(Warning: incoming answer, followed by a rant.)

Base is not a submodule, meaning that there is essentially no way to
automatically check it back out to the "exact same state" it was in,
given some specified GHC commit - the commit IDs are not tracked.

At this point, you are basically on your own. You'll have to manually
checkout libraries/base to a specific commit that occurred 'around'
the same time as the GHC commit. In this case, that means looking
through whatever commits hit HEAD on May 7th:

$ cd libraries/base
$ git log --until="May 7th"

The resulting list will show you what happened up to may 7th. Take the
latest commit in that list, and check out base to that revision. Any
commits afterword happened on may 8th or later:

$ git checkout -b temporary-io-fix <sha1 of latest May 7th commit>

You're going to need to do this for every module that is not tracked
as a submodule. Most of the repositories are very low-activity. base &
testsuite are going to be the annoying ones.

You'll have to continue this 'manual bisection' by hand, with a very
hefty dose of frustrating trial-and-error, in my experience.

There is a secondary alternative. GHC has a script called
'fingerprint.py' (in utils/fingerprint/) which is somewhat designed to
work around this deficiency (very poorly.) This script basically dumps
out a text file, containing a key/value pair mapping every repository
to its current HEAD commit. It can then take that text file and
automatically do 'git checkout' for you in every repo. The idea is you
can take fingerprints of the tree, save the results, and cleanly check
out to some state later.

The GHC build bots run by Ben L.'s "Buildbox" library automatically
runs the 'fingerprint.py' script during nightly-builds, from what I
remember. It may be possible to just look in the ghc-builds archives,
and steal some fingerprints from the last month off one of the
buildbots. I don't know who maintains the individual bots; perhaps you
can ask the list. However, this will at best give you a 1-day level of
granularity, rather than commit level granularity, which is still
rather unsatisfying.

------------- Answer over, rant begins. ---------------------

I know we had this discussion sometime recently I think, but can
someone *please* explain why we are in this situation of half
submodules, half random-floating-git-repository-checkouts? It's
terrible. I'm frankly surprised we've even been doing it this long,
over a year or more? It is literally the worst of submodules, and
free-standing-repositories put together, with none of the advantages
of either.

Free-standing repos are attractive because they are just there, and
you don't have to 'maintain' them (sort of.) Submodules are attractive
because they identify the critical points in which your repositories
depend on each other. We have neither benefit right now, clearly.

In particular, this makes it impossible to use tools like 'git bisect'
which is *incredibly* useful for just these exact cases. Hell, you can
even make 'git bisect' work almost 100% automatically with a tiny bit
of shell scripting.

http://mainisusuallyafunction.blogspot.com/2012/09/tracking-down-unused-variables-with.html

You could just instead have a script that built the compiler, and ran
the built compiler on your testcase, after every bisection. Wouldn't
it be *great* to have something like that Just Work? A tool like this
could potentially boil down Kazu's bug almost automatically for
example, with little-to-no frustrating intervention.

And even now, looking at the repository listing of what is in
libraries/, that are not submodules, I really see no reason why more -
or even all - of them cannot be submodules. Is it a workflow issue of
some sort? That's what I'm thinking at this point, but I also don't
think it could be any worse than it is now.

Realistically, very few libraries GHC needs for bootstrapping seem to
change that much. unix, integer-simple, haskeline and filepath for
example change *extremely* infrequently, but all are free-standing.
Why? In the event they were submodules, would anything actually be
lost?

The maintainer - that is, not GHC HQ - would still 'own' the official
repository. They can make changes to it. But if there is a necessity
to pull that in for GHC (feature request, bug fix, random thing) it
can be done by updating the submodule pointer to the new commit. But
this must happen explicitly by a GHC committer. In the event they
update the submodule pointer, they should also obviously make sure the
build still works.

That means we have to update the submodule pointers ourselves if
things change. That sucks I guess, but really, aside from base and
testsuite, the two most frequently changing repositories, is that
*actually* going to cost us a lot of work?

And even if it does cost us work, I'll speak for myself: I will gladly
pay for that work and do it all myself if it means I can actually
bisect and actually roll back my tree to some point to fix things -
without needing to prepare for it months in advance using hacks. Like
creating thousands of fingerprints, using fingerprint.py every day
when people make commits (no, I haven't done this, but it could be
done, and I really don't want to do it.)

Long-term reproducible builds are, IMO, a must for any project.
*Especially* a project of our size. *Especially* a compiler of all
things. But as it stands, when you build GHC, you can probably
reproduce *today's* results and *today's* bugs. Last month's results?
Last years? Finding the difference between those months ago and today?
Good luck - you will need it.

On Tue, Jun 4, 2013 at 8:07 PM, Kazu Yamamoto <kazu at iij.ad.jp> wrote:
> Hi,
>
> Andreas and I found that the new IO manager is not working properly in
> the current GHC head. I'm sure that it worked well at least on May 7.
>
> We need to narrow the range of commits, so I did:
>
>   % git checkout bb2795db36b36966697c228315ae20767c4a8753
>   % git submodule update
>
> But this does not checkout proper submodules. For instance,
> libraries/base has newer commits. And of cource, building fails.
>
> Please tell us how to checkout proper submodules against a specific
> GHC tree.
>
> --Kazu
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs



-- 
Regards,
Austin - PGP: 4096R/0x91384671



More information about the ghc-devs mailing list