Difference between revisions of "Git"

From HaskellWiki
Jump to navigation Jump to search
m (packfile motivation?)
Line 15: Line 15:
 
=== Kinds of objects ===
 
=== Kinds of objects ===
   
  +
There are 4 kinds of objects: tag, blob, tree and commit.
TODO
 
  +
  +
- a blob contains data.
  +
  +
<haskell>
  +
data Blob = Blob ByteString
  +
</haskell>
  +
  +
- a tree contains name associated by reference with blobs or trees. This represent a filesystem hierarchy, with trees representing directories, and blobs representing files:
  +
  +
<haskell>
  +
data TreeContent = T TreeReference | B BlobReference
  +
data Tree = [ (Name, TreeContent ]
  +
</haskell>
  +
  +
- a tag object is a signed reference with a the signature's author.
  +
  +
<haskell>
  +
type SignatureBlob = ByteString
  +
data Tag = Tag ObjectReference Name SignatureBlob
  +
</haskell>
  +
  +
- a commit is essentially a node of the DAG with associated metadata (who created this commit, at which time). A commit point to a arborescence through a tree reference, and may have parents which are tree references:
  +
  +
<haskell>
  +
data Commit = Commit
  +
{ tree :: TreeReference
  +
, parents :: [TreeReference]
  +
, author :: (Name,Time)
  +
, committer :: (Name, Time)
  +
, message :: ByteString
  +
}
  +
</haskell>
   
 
=== The object store ===
 
=== The object store ===

Revision as of 10:51, 7 September 2012

WORK IN PROGRESS

This page aims to introduce the concepts behind Git in a "Haskell way".

The DAG

TODO

Branches and tags

TODO

Objects

Kinds of objects

There are 4 kinds of objects: tag, blob, tree and commit.

- a blob contains data.

data Blob = Blob ByteString

- a tree contains name associated by reference with blobs or trees. This represent a filesystem hierarchy, with trees representing directories, and blobs representing files:

data TreeContent = T TreeReference | B BlobReference
data Tree = [ (Name, TreeContent ]

- a tag object is a signed reference with a the signature's author.

type SignatureBlob = ByteString
data Tag = Tag ObjectReference Name SignatureBlob

- a commit is essentially a node of the DAG with associated metadata (who created this commit, at which time). A commit point to a arborescence through a tree reference, and may have parents which are tree references:

data Commit = Commit
        { tree      :: TreeReference
        , parents   :: [TreeReference]
        , author    :: (Name,Time)
        , committer :: (Name, Time)
        , message   :: ByteString
        }

The object store

All the different objects in Git - individual files, entire directory trees, commits and other things - are stored in a repository-wide central store. Each object is identified by computing a SHA-1 hash, which is a function of only the object's contents.

But... doesn't that mean that when I change a single line in a file, a whole new copy is stored?

Yes.

However, every once in a while, git compacts files that are similar together into Packfiles, by storing only their diffs. [TO BE EXPANDED?]

Garbage collection and git reflog

When objects are not reachable from any root (like a branch reference), they become dangling and are subject to garbage collection. However, garbage collection does not kick in immediately.

When making a mistake, it is often helpful to look at commit objects by date independent of whether they are reachable, in order to be able to restore them. You can use git reflog for that.