Is there any documentation on how Git stores files in his repository? I'm try to search over the Internet, but no usable results. Maybe I'm using incorrect query or maybe this is great secret — Git repository internal format?
Let me explain, why I need this rocket science information: I'm using C# to get file history form repository. But in libgit2sharp
library it's not implemented currently. So (as a responsible person ;) I need to implement this feature by myself and contribute to community.
But after moving kernel sources to github I'm even don't know where start to my search.
Many thanks in advance!
The internal format of the repository is extremely simple. Git is in essence a user space file system that's content addressable.
Here's a thumbnail sketch.
Objects
Git stores its internal data structures as objects. There are four kinds of objects: blobs (sort of like files), trees (sort of like directories), commits (snapshots of the file system at particular points in time along with information on how to reach there) and tags (pointers to commits useful for marking important ones).
If you look inside the
.git
directory of a repository, you'll find anobjects
directory that contains files named by the SHA-1 hash. Each of them represents an object. You can inspect them using plumbinggit cat-file
command. An example commit object from one of my repositoriesYou can also see the the object itself at
.git/objects/73/47addd901afc7d237a3e9c9512c9b0d05c6cf7
.You can examine other objects like this. Each commit points to a tree representing the file system at that point in time and has one (or more in case of merge commits) parent.
Objects are stored as single files in the
objects
directory. These are called loose objects. When you rungit gc
, objects that can no longer be reached are pruned and the remaining are packed together into a a single file and delta compressed. This is more space efficient and compacts the repository. After you run gc, you can look at the.git/objects/pack/
directory to see git packfiles. To unpack them, you can use the plumbing commandgit unpack-objects
command. The.git/objects/info/packs
file contains a list of packfiles that are currently present.References
The next thing you need to know is what references are. These are pointers to certain commits or objects. Your branches and other such things are implemented as references. There are two kinds "real" (which are like hard links in a file system) and "symbolic" (which are pointers to real references - like symbolic links).
These are located in the
.git/refs
directory. For example, in the above repository, I'm on themaster
branch. My latest commit isYou can see that my
master
reference located at.git/refs/heads/master
points to this commit.The current branch is stored in the symbolic reference
HEAD
located at.git/HEAD
. Here it isIt will change if you switch branches.
Similarly, tags are references like this too (but they are not movable unlike branches).
The entire repository is managed using just a DAG of commits (each of which points to a tree representing the files at a point in time) and references that point to various commits on the DAG so that you can manipulate them.
Further reading