Difference between revisions of "Git"

From HaskellWiki
Jump to navigation Jump to search
Line 5: Line 5:
 
== The DAG ==
 
== The DAG ==
   
  +
Each node of the DAG is uniquely identified by a reference, and represent an immutable history point (commit).
TODO
 
   
 
== Branches and tags ==
 
== Branches and tags ==

Revision as of 12:29, 7 September 2012

WORK IN PROGRESS

This page aims to introduce the concepts behind Git in a "Haskell way".

The DAG

Each node of the DAG is uniquely identified by a reference, and represent an immutable history point (commit).

Branches and tags

TODO

Objects

Kinds of objects

There are 4 kinds of objects: tag, blob, tree and commit.

- a blob contains data.

data Blob = Blob ByteString

- a tree contains name associated by reference with blobs or trees. This represent a filesystem hierarchy, with trees representing directories, and blobs representing files:

data TreeContent = T TreeReference | B BlobReference
data Tree = [ (Name, TreeContent ]

- a tag object is a signed reference with a the signature's author.

type SignatureBlob = ByteString
data Tag = Tag ObjectReference Name SignatureBlob

- a commit is essentially a node of the DAG with associated metadata (who created this commit, at which time). A commit point to a arborescence through a tree reference, and may have parents which are tree references:

data Commit = Commit
        { tree      :: TreeReference
        , parents   :: [TreeReference]
        , author    :: (Name,Time)
        , committer :: (Name, Time)
        , message   :: ByteString
        }

The object store

All the different objects in Git - individual files, entire directory trees, commits and other things - are stored in a repository-wide central store. Each object is identified by computing a SHA-1 hash, which is a function of only the object's contents.

But... doesn't that mean that when I change a single line in a file, a whole new copy is stored?

Yes.

However, every once in a while, git compacts files that are similar together into Packfiles, by storing only their diffs. [TO BE EXPANDED?]

Garbage collection and git reflog

When objects are not reachable from any root (like a branch reference), they become dangling and are subject to garbage collection. However, garbage collection does not kick in immediately.

When making a mistake, it is often helpful to look at commit objects by date independent of whether they are reachable, in order to be able to restore them. You can use git reflog for that.