[Haskell-cafe] announcing darcs 2.0.0pre1, the first prerelease for darcs 2

Mon Dec 10 15:24:37 EST 2007

We are happy to announce the first prerelease version of darcs 2! Darcs 2
will feature numerous improvements, and this prerelease will also feature a
few regressions, so we're looking for help, from both Haskell developers
and users willing to try this release out.  Read below, to see how you can
benefit from this new release, and how you can help us to make the final
darcs 2 release the best ever! (for the latter, see
http://wiki.darcs.net/index.html/DarcsTwo/HowToHelp)

(for an expanded version of this announcement, see
http://wiki.darcs.net/index.html/DarcsTwo)

Darcs 2 features user-visible changes two broad categories, and several
under-the-hood improvements designed to improve code stability and safety.
The user-visible changes are a new "hashed" repository format, and the new
darcs-2 conflict handling.  The new "hashed" repository format can be used
in a manner that is interchangeable with older darcs--although older
versions of darcs cannot read the hashed format, darcs 2 can allows you to
exchange patches between repositories in new and old formats.  The new
conflict handling benefits from the new hashed format, but also requires a
repository conversion that is not backwards-compatible, so projects
switching to darcs-2 format will have require that all their users upgrade
to darcs 2.

=== Getting darcs 2 ===
You can get a prerelease version of darcs 2 either by getting the latest
unstable darcs

darcs get http://darcs.net/repos/unstable

or by downloading the prerelease tarball from
http://darcs.net/darcs-2.0.0pre1.tar.gz.  Once you've compiled your new
darcs, you could take it for a test drive by getting a fresh copy of darcs
with the hashed repository format:

darcs get http://darcs.net/repos/unstabled-hashed

= Hashed repository format =

We expect that most testers of darcs 2 will only try the hashed repository
format.  While we'd prefer to also have many users testing out actual
darcs-2 format repositories, the two codebases have much in common, so
tests of the hashed format will greatly help us in improving darcs 2 as a
whole. 

The hashed repository format has a number of changes that are visible to
users.

 1. The hashed format allows for greater atomicity of operations.  This
 makes for greater safety and simultaneously greater efficiency.  These
 benefits, however, have not been fully realized in this release.  For
 instance, with a hashed repository, there is no need for darcs push to
 require a repository lock, so you could record patches while waiting for a
 push to finish (for instance, if it's waiting on the test suite).

 2. The _darcs/pristine directory no longer holds the pristine cache.  This
 disallows certain hackish short-cuts, but also dramatically reduces the
 danger of third-party programs (e.g. DreamWeaver) recursing into the
 pristine cache and corrupting darcs repositories.

 3. Darcs get is now much faster, and always operates in a "lazy" fashion,
 meaning that patches are downloaded only when they are needed.  This gives
 us much of the benefits of --partial repositories, without most of their
 disadvantages.  This approach, however, does have several new dangers.
 First, some operations may unexpectedly require the download of large
 numbers of patches, which could be slow (but you could always interrupt
 with ^C).  Secondly, if the source repository disappears, or you lose
 network connectivity, some operations may fail.  I do not believe these
 dangers will prove particularly problematic, but we may need to fine-tune
 the user interface to make it more clear what is going on.

 4. Darcs now supports caching of patches and file contents to reduce
 bandwidth and save disk space.  See below for how to enable this.  In my
 opinion, this is actually the most exciting new feature, as it greatly
 speeds up a number of operations, and is essentially transparent.  The
 only reason we don't enable it by default is because I'm uncomfortable
 creating a large directory in ~/.darcs/cache without the user's explicit
 consent.

=== Creating a repository in the hashed format ===

Creating a hashed repository is as easy as

darcs get --hashed oldrepository newrepository

or alternatively you could create a fresh repository with

darcs initialize --hashed

You can push, pull and send patches at will between hashed and
old-fashioned repositories, so you should be able to experiment with this
format even on projects that you do not control.

=== Enabling a global cache ===

It is very simple to enable a global cache.  Simply execute

$ mkdir -p $HOME/.darcs/cache
$ echo cache:$HOME/.darcs/cache > $HOME/.darcs/sources

This will cause darcs to store hard links in ~/.darcs/cache.  It is always
safe to delete this directory.

= Darcs-2 merging =

The future of darcs is in the darcs-2 repository format, which features a
new merge algorithm that introduces two major user-visible changes

 1. It should no longer be possible to confuse darcs or freeze it
 indefinitely by merging conflicting changes.  However, this feature
 '''needs to be tested''', so please, do your worst, and let us know how
 darcs can handle it!

 2. Identical primitive changes no longer conflict.  This is a
 long-requested feature, and has far-reaching implications.  See below (the
 section on "new semantics") for a discussion of these implications.

=== Creating a repository in the darcs-2 format ===

Converting an existing repository to the darcs-2 format is as easy as

darcs convert oldrepository newrepository

However, the convert command does run rather slowly.  Moreover, you should
ideally only perform this command once per project, as the conversion is
not reversible, and its result is dependent on the order of patches in your
repository.  Of course, you can experiment all you like, but projects
should switch to darcs-2 format in unison, and only after the final release
of darcs 2.

You can also create a fresh repository with

darcs initialize --darcs-2

== Changes in semantics ==

When using the darcs-2 format, darcs treats identical primitive patches as
the''same''patch.  This has dramatic implications in how darcs-2 will
define dependencies.  In particular, dependencies (except those explicitly
created by the use with --add-deps) are always dependencies on a
given''primitive''patch, not on a given named patch.  This means that the
change named "foo" may in effect depend on''either  the change named "bar"
or the change named "baz"''.  This prerelease of darcs 2 has not been fully
converted to always take advantage of these new semantics--it will not
cause corruption, but under unusual circumstances, could exit with an
error. We need to decide how to handle these semantics in the user
interface.

=== A simple example ===

Let me illustrate what could happen with a story. Steve creates changes "A"
and "B":

steve$ echo A > foo
steve$ darcs add foo
steve$ darcs record -m A
steve$ echo B > foo
steve$ darcs record -m B

Meanwhile, Monica also decides she'd like a file named foo, and she also
wants it to contain A, but she also wants to make some other changes:

monica$ echo A > foo
monica$ darcs add foo
monica$ echo Z > bar
monica$ darcs add bar
monica$ darcs record -m AZ

At this point, Monica pulls from Steve:

monica$ darcs pull ../steve

but she decides she prefers her AZ change, to Steve's A change, and being a
harsh person, she decides to obliterate his change:

monica$ darcs obliterate  --match 'exact A' --all

At this point, darcs 1 would complain, pointing out that patch B depends on
patch A.  However, darcs 2 will happily obliterate patch A, because patch
AZ provides the primitive patches that B depends upon. At this point,
however, we run into the limitations of this prerelease version of darcs 2:
If Steve pulls from Monica, his darcs will fail, because the common set of
patches (which is only B) cannot exist without either A or AZ.  I plan to
fix this behavior, but the internal API for doing so is not at all clear to
me, which is why I'm looking for input from others.  But note that this
situation can only occur if users take advantage of the new semantics,
which I suspect will be relatively seldom, until we give them tools to more
easily do so (see below).

=== A few implications ===

At first this may look like a regression.  Certainly, it took me a long
conversation (with Steve, immortalized in the above example--but in truth,
Monica would never be so unkind as to obliterate Steve's change) to
determine that this behavior is actually a Good Thing, and that the
potential confusion among users is a relatively small danger. The main
lesson regarding these new semantics is that''patches depends on primitive
patches, not on named patches''.  A named patch is really just a set of
primitive patches.  Once we train darcs to take advantage of this feature,
several tantalizing possibilities open up:

 1. As the above example illustrates, in certain circumstances we can
 obliterate patches that are depended upon by other patches.  We could
 automate this, by enabling obliterate (perhaps given a flag?) to break
 apart the patch it's trying to obliterate, and leave behind only those
 primitive patches which are depended upon by other patches.  Perhaps we
 could call this new feature "atomization".  Of course, this applies
 equally well to unrecord.

 2. Recognizing that amend-record is equivalent to unrecord followed by
 record in a clean repository, it becomes clear that with atomization we
 could amend even patches that later patches depend upon.

I guess these two (three?) examples are all that come to mind, but they're
big examples, features that have been requested many times over the years.
Of course, there would be some debris left over, a portion of the patch
that got atomized, but this debris would be minimal, and seems to me to be
necessary.

            ========================================

The version of this announcement on the wiki has more discussion of
implementation plans and how you can contribute.  I urge you to join in the
fun of testing and optimizing this new version of darcs!

David