Difference between revisions of "Git"

From HaskellWiki
Jump to navigation Jump to search
(Added section "Further reading" and a link to "Understanding the Git Workflow")
(Improved language and added links)
 
Line 8: Line 8:
 
== Introduction ==
 
== Introduction ==
   
[http://git-scm.com/ Git] is a distributed revision control system, used by many Haskellers. [[Darcs]] is also popular, but tends to get slow when projects grow large. [https://github.com/ GitHub] is a site that is used, amongst others, for many open source [https://github.com/search?q=Haskell&type=&ref=simplesearch Haskell projects].
+
[http://git-scm.com/ Git] is a distributed revision control system, used by many Haskellers. [[Darcs]] is also popular, but it tends to get slow when projects grow large. [https://github.com/ GitHub] is a site for Git based projects that is used, amongst others, for many open source [https://github.com/search?q=Haskell&type=&ref=simplesearch Haskell projects].
   
   
 
== The DAG ==
 
== The DAG ==
   
Each node of the [http://en.wikipedia.org/wiki/Directed_acyclic_graph DAG] is uniquely identified by a reference, and represent an immutable history point (commit).
+
Each node of the [http://en.wikipedia.org/wiki/Directed_acyclic_graph DAG] is uniquely identified by a reference, and represents an immutable history point (commit).
   
   
Line 20: Line 20:
 
Branches and tags point to the DAG through a reference. They provide a way to name entry points in the DAG that are meaningful to the user.
 
Branches and tags point to the DAG through a reference. They provide a way to name entry points in the DAG that are meaningful to the user.
   
Branches contains references that usually change once work is done on the branch.
+
Branches contain references that usually change once work is done on the branch.
   
Tags are essentially the same as a branch except that by design they name a specific reference and usually do not change.
+
Tags are essentially the same as a branch, except that, by design, they name a specific reference and usually do not change.
   
   
Line 37: Line 37:
 
</haskell>
 
</haskell>
   
- a tree contains name associated by reference with blobs or trees. This represent a filesystem hierarchy, with trees representing directories, and blobs representing files:
+
- a tree contains name associated by reference with blobs or trees. This represents a filesystem hierarchy, with trees representing directories, and blobs representing files:
   
 
<haskell>
 
<haskell>
Line 44: Line 44:
 
</haskell>
 
</haskell>
   
- a tag object is a signed reference with a the signature's author.
+
- a tag object is a reference object, containing the author's signature.
   
 
<haskell>
 
<haskell>
Line 51: Line 51:
 
</haskell>
 
</haskell>
   
- a commit is essentially a node of the DAG with associated metadata (who created this commit, at which time). A commit point to a arborescence through a tree reference, and may have parents which are tree references:
+
- a commit is essentially a node of the DAG with associated metadata (who created this commit, at which time). A commit points to an [http://en.wikipedia.org/wiki/Arborescence_(graph_theory) arborescence] through a tree reference, and may have parents which are tree references:
   
 
<haskell>
 
<haskell>
Line 65: Line 65:
 
=== The object store ===
 
=== The object store ===
   
All the different objects in Git - individual files, entire directory trees, commits and other things - are stored in a repository-wide central store. Each object is identified by computing a SHA-1 hash, which is a function of only the object's contents.
+
All the different objects in Git - individual files, entire directory trees, commits and other things - are stored in a repository-wide central store. Each object is identified by computing a [http://en.wikipedia.org/wiki/SHA-1 SHA-1] hash, which is a function of only the object's contents.
   
 
=== But... doesn't that mean that when I change a single line in a file, a whole new copy is stored? ===
 
=== But... doesn't that mean that when I change a single line in a file, a whole new copy is stored? ===
Line 84: Line 84:
 
* [https://sandofsky.com/blog/git-workflow.html Understanding the Git Workflow]
 
* [https://sandofsky.com/blog/git-workflow.html Understanding the Git Workflow]
   
  +
* [http://en.wikipedia.org/wiki/Git_(software) The Wikipedia article on Git]
   
   

Latest revision as of 06:26, 3 April 2013

This article is a stub. You can help by expanding it.

WORK IN PROGRESS

This page aims to introduce the concepts behind Git in a "Haskell way".


Introduction

Git is a distributed revision control system, used by many Haskellers. Darcs is also popular, but it tends to get slow when projects grow large. GitHub is a site for Git based projects that is used, amongst others, for many open source Haskell projects.


The DAG

Each node of the DAG is uniquely identified by a reference, and represents an immutable history point (commit).


Branches and tags

Branches and tags point to the DAG through a reference. They provide a way to name entry points in the DAG that are meaningful to the user.

Branches contain references that usually change once work is done on the branch.

Tags are essentially the same as a branch, except that, by design, they name a specific reference and usually do not change.


Objects

Kinds of objects

There are 4 kinds of objects: tag, blob, tree and commit.

- a blob contains data.

data Blob = Blob ByteString

- a tree contains name associated by reference with blobs or trees. This represents a filesystem hierarchy, with trees representing directories, and blobs representing files:

data TreeContent = T TreeReference | B BlobReference
data Tree = [ (Name, TreeContent ]

- a tag object is a reference object, containing the author's signature.

type SignatureBlob = ByteString
data Tag = Tag ObjectReference Name SignatureBlob

- a commit is essentially a node of the DAG with associated metadata (who created this commit, at which time). A commit points to an arborescence through a tree reference, and may have parents which are tree references:

data Commit = Commit
        { tree      :: TreeReference
        , parents   :: [TreeReference]
        , author    :: (Name,Time)
        , committer :: (Name, Time)
        , message   :: ByteString
        }

The object store

All the different objects in Git - individual files, entire directory trees, commits and other things - are stored in a repository-wide central store. Each object is identified by computing a SHA-1 hash, which is a function of only the object's contents.

But... doesn't that mean that when I change a single line in a file, a whole new copy is stored?

Yes.

However, every once in a while, git compacts files that are similar together into Packfiles, by storing only their diffs. [TO BE EXPANDED?]

Garbage collection and git reflog

When objects are not reachable from any root (like a branch reference), they become dangling and are subject to garbage collection. However, garbage collection does not kick in immediately.

When making a mistake, it is often helpful to look at commit objects by date independent of whether they are reachable, in order to be able to restore them. You can use git reflog for that.


Further reading