<div dir="ltr"><div>When I was fiddling with having to rollback everything to a known good state I patched sync-all to checkout all the repos to the state they were in on a certain date, it's pretty naive, but it should be usable for doing manual bisecting at least. I can't find the old mailing list archives, so I attach the patch here.<br>
<br></div>Niklas<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">2013/6/5 Austin Seipp <span dir="ltr"><<a href="mailto:aseipp@pobox.com" target="_blank">aseipp@pobox.com</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
(Warning: incoming answer, followed by a rant.)<br>
<br>
Base is not a submodule, meaning that there is essentially no way to<br>
automatically check it back out to the "exact same state" it was in,<br>
given some specified GHC commit - the commit IDs are not tracked.<br>
<br>
At this point, you are basically on your own. You'll have to manually<br>
checkout libraries/base to a specific commit that occurred 'around'<br>
the same time as the GHC commit. In this case, that means looking<br>
through whatever commits hit HEAD on May 7th:<br>
<br>
$ cd libraries/base<br>
$ git log --until="May 7th"<br>
<br>
The resulting list will show you what happened up to may 7th. Take the<br>
latest commit in that list, and check out base to that revision. Any<br>
commits afterword happened on may 8th or later:<br>
<br>
$ git checkout -b temporary-io-fix <sha1 of latest May 7th commit><br>
<br>
You're going to need to do this for every module that is not tracked<br>
as a submodule. Most of the repositories are very low-activity. base &<br>
testsuite are going to be the annoying ones.<br>
<br>
You'll have to continue this 'manual bisection' by hand, with a very<br>
hefty dose of frustrating trial-and-error, in my experience.<br>
<br>
There is a secondary alternative. GHC has a script called<br>
'fingerprint.py' (in utils/fingerprint/) which is somewhat designed to<br>
work around this deficiency (very poorly.) This script basically dumps<br>
out a text file, containing a key/value pair mapping every repository<br>
to its current HEAD commit. It can then take that text file and<br>
automatically do 'git checkout' for you in every repo. The idea is you<br>
can take fingerprints of the tree, save the results, and cleanly check<br>
out to some state later.<br>
<br>
The GHC build bots run by Ben L.'s "Buildbox" library automatically<br>
runs the 'fingerprint.py' script during nightly-builds, from what I<br>
remember. It may be possible to just look in the ghc-builds archives,<br>
and steal some fingerprints from the last month off one of the<br>
buildbots. I don't know who maintains the individual bots; perhaps you<br>
can ask the list. However, this will at best give you a 1-day level of<br>
granularity, rather than commit level granularity, which is still<br>
rather unsatisfying.<br>
<br>
------------- Answer over, rant begins. ---------------------<br>
<br>
I know we had this discussion sometime recently I think, but can<br>
someone *please* explain why we are in this situation of half<br>
submodules, half random-floating-git-repository-checkouts? It's<br>
terrible. I'm frankly surprised we've even been doing it this long,<br>
over a year or more? It is literally the worst of submodules, and<br>
free-standing-repositories put together, with none of the advantages<br>
of either.<br>
<br>
Free-standing repos are attractive because they are just there, and<br>
you don't have to 'maintain' them (sort of.) Submodules are attractive<br>
because they identify the critical points in which your repositories<br>
depend on each other. We have neither benefit right now, clearly.<br>
<br>
In particular, this makes it impossible to use tools like 'git bisect'<br>
which is *incredibly* useful for just these exact cases. Hell, you can<br>
even make 'git bisect' work almost 100% automatically with a tiny bit<br>
of shell scripting.<br>
<br>
<a href="http://mainisusuallyafunction.blogspot.com/2012/09/tracking-down-unused-variables-with.html" target="_blank">http://mainisusuallyafunction.blogspot.com/2012/09/tracking-down-unused-variables-with.html</a><br>
<br>
You could just instead have a script that built the compiler, and ran<br>
the built compiler on your testcase, after every bisection. Wouldn't<br>
it be *great* to have something like that Just Work? A tool like this<br>
could potentially boil down Kazu's bug almost automatically for<br>
example, with little-to-no frustrating intervention.<br>
<br>
And even now, looking at the repository listing of what is in<br>
libraries/, that are not submodules, I really see no reason why more -<br>
or even all - of them cannot be submodules. Is it a workflow issue of<br>
some sort? That's what I'm thinking at this point, but I also don't<br>
think it could be any worse than it is now.<br>
<br>
Realistically, very few libraries GHC needs for bootstrapping seem to<br>
change that much. unix, integer-simple, haskeline and filepath for<br>
example change *extremely* infrequently, but all are free-standing.<br>
Why? In the event they were submodules, would anything actually be<br>
lost?<br>
<br>
The maintainer - that is, not GHC HQ - would still 'own' the official<br>
repository. They can make changes to it. But if there is a necessity<br>
to pull that in for GHC (feature request, bug fix, random thing) it<br>
can be done by updating the submodule pointer to the new commit. But<br>
this must happen explicitly by a GHC committer. In the event they<br>
update the submodule pointer, they should also obviously make sure the<br>
build still works.<br>
<br>
That means we have to update the submodule pointers ourselves if<br>
things change. That sucks I guess, but really, aside from base and<br>
testsuite, the two most frequently changing repositories, is that<br>
*actually* going to cost us a lot of work?<br>
<br>
And even if it does cost us work, I'll speak for myself: I will gladly<br>
pay for that work and do it all myself if it means I can actually<br>
bisect and actually roll back my tree to some point to fix things -<br>
without needing to prepare for it months in advance using hacks. Like<br>
creating thousands of fingerprints, using fingerprint.py every day<br>
when people make commits (no, I haven't done this, but it could be<br>
done, and I really don't want to do it.)<br>
<br>
Long-term reproducible builds are, IMO, a must for any project.<br>
*Especially* a project of our size. *Especially* a compiler of all<br>
things. But as it stands, when you build GHC, you can probably<br>
reproduce *today's* results and *today's* bugs. Last month's results?<br>
Last years? Finding the difference between those months ago and today?<br>
Good luck - you will need it.<br>
<div class="im HOEnZb"><br>
On Tue, Jun 4, 2013 at 8:07 PM, Kazu Yamamoto <<a href="mailto:kazu@iij.ad.jp">kazu@iij.ad.jp</a>> wrote:<br>
</div><div class="HOEnZb"><div class="h5">> Hi,<br>
><br>
> Andreas and I found that the new IO manager is not working properly in<br>
> the current GHC head. I'm sure that it worked well at least on May 7.<br>
><br>
> We need to narrow the range of commits, so I did:<br>
><br>
> % git checkout bb2795db36b36966697c228315ae20767c4a8753<br>
> % git submodule update<br>
><br>
> But this does not checkout proper submodules. For instance,<br>
> libraries/base has newer commits. And of cource, building fails.<br>
><br>
> Please tell us how to checkout proper submodules against a specific<br>
> GHC tree.<br>
><br>
> --Kazu<br>
><br>
> _______________________________________________<br>
> ghc-devs mailing list<br>
> <a href="mailto:ghc-devs@haskell.org">ghc-devs@haskell.org</a><br>
> <a href="http://www.haskell.org/mailman/listinfo/ghc-devs" target="_blank">http://www.haskell.org/mailman/listinfo/ghc-devs</a><br>
<br>
<br>
<br>
</div></div><span class="HOEnZb"><font color="#888888">--<br>
Regards,<br>
Austin - PGP: 4096R/0x91384671<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
_______________________________________________<br>
ghc-devs mailing list<br>
<a href="mailto:ghc-devs@haskell.org">ghc-devs@haskell.org</a><br>
<a href="http://www.haskell.org/mailman/listinfo/ghc-devs" target="_blank">http://www.haskell.org/mailman/listinfo/ghc-devs</a><br>
</div></div></blockquote></div><br></div>