I have a patch looks like
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/x c/xc.c
15 index e220f68..e611b24 100644
16 --- a/tools/python/xen/lowlevel/xc/xc.c
17 +++ b/tools/python/xen/lowlevel/xc/xc.c
18 @@ -228,6 +228,7 @@ static PyObject *pyxc_vcpu_setaffinity(XcObject *self,
19 int vcpu = 0, i;
20 xc_cpumap_t cpumap;
21 PyObject *cpulist = NULL;
And I want to know which commit
generates the patch, and how to parse 15 index e220f68..e611b24 100644
in the patch?
Let's take a look at output from
git show
. (This is actual output from a real repo, although I'll snip most bits.)Here,
d362e62
is the "short version" of the true name of the commit, i.e., its SHA-1. The "long" form is the full 40-character version, which is the first line ofgit show
output.Besides the commit text, the commit itself contains a "tree" (and zero or more "parents"). We can see this with
git cat-file -p
:We can look at the "tree" as well. I could use the "true name" SHA-1 above, but here I use a bit of git syntax: a commit identifier followed by
^{tree}
tells git to extract the tree ID from the commit ID.I left in the line for
fmt
as it is a symlink tofmt.py
. The symlink has mode120000
, which tells git that theblob
data is actually the target of the symlink. The file,fmt.py
, has mode100755
, which tells git that it's an ordinary file and that it is executable (it's a Python script). This is the source of the100644
or100755
you see in theindex
line.The "true name" of the blob (file object) in the git repo is that 40-character SHA-1. The 7-character abbreviated version for
fmt.py
isba772ee
. This is the second number in the two..
-separated numbers on theindex
line.The first number on that line is the "true name" in the git repo of the previous version of the file, i.e., the version of
fmt.py
that was in the repo before I created commitd362e62
.We can use another bit of special git syntax to see these as well.1 As documented in gitrevisions, following a commit-specifier with a hat character (circumflex, up-arrow, whatever you like to call it)
^
tells git to find the first parent of that commit. So:tells us that the commit before the commit I gave to
git show
is the one named41f3a6b...
. And, sure enough, if wegit cat-file -p
that, we get another commit with another tree, and if wegit cat-file
that tree-ID and look forfmt.py
we will find anotherblob
with another SHA-1:And there it is:
cc4c267
is the abbreviated form of the "true name" of the file stored in the previous commit. This is the first number in theindex
line.I wrote this all out in long form to illustrate how git gets from "point A" to "point B". But, just as with the short-hand syntax
d362e62^{tree}
, there is a very easy way to get the blob SHA-1 values usinggit rev-parse
:If you want the shortened versions, use
git rev-parse --short
to truncate the SHA-1 values to (normally) 7 characters.So:
The
15
is a line number you (or someone somewhere) added, and now you know what the rest of the values on theindex
line are. But to find the commit—well, that's the hard part. The commit is what finds the other values. There is no link from "other values" back to "commit": the "arrows", as it were, only point from commits to trees, and then from trees to blobs. There are no pointers from blobs to trees, nor from trees to commits.Git always starts with some sort of externally specified name. Usually this is a branch name or tag, or a "symbolic reference" (as
HEAD
normally is, when you don't have a "detached head"). The reference locates a commit.2 If the reference is a branch name, that commit is the "tip" of that branch.3 If it's a tag, it still finds a commit. If it'sHEAD
, andHEAD
is the name of a branch likemaster
, git just turnsHEAD
intomaster
and then turnsmaster
into a commit. In other words, the commit is where you start, usually by going from name to commit-ID—but you can almost always specify a "raw" SHA-1 ID here.Once git has a commit-ID, that commit identifies more commits (its parents) and a tree. The tree identifies sub-trees if needed, and the tree and its sub-trees identify blobs. Starting from all the commits that have "external names", git eventually finds all trees and all blobs—and any trees or blobs in the repository that are not found this way are eligible for garbage-collection, when you run
git gc
(or whengit gc
runs automatically). (This is how deleted branches, and any number of special temporary files that git creates internally, are cleaned-up later.)1Git has a lot of special syntax. The most useful ones to memorize, in my opinion:
master^
= "parent ofmaster
"N
after thing = back upN
parents:master~2
= "grandparent ofmaster
"X..Y
= "all revisions selected byY
, excluding all revisions selected byX
":git log master..devel
= "log all commits on branchdevel
that are not onmaster
"The
..
syntax is also used ingit diff
, but here instead of "stuff onY
that's not onX
", you get a direct comparison of the version associated withX
against the version associated withY
.2I'm deliberately skipping over "annotated tags", which also have repository entities. In some cases git will access the tag object, and in others—when it needs a commit, tree, and/or blob—git will automatically follow the annotated tag. Internally, an annotated tag looks very similar to a commit, except that instead of a tree and parents, it has a reference to another git repository object—usually directly to a commit, but sometimes to another tag, and in theory you can make an annotated tag for a tree or a blob, skipping over the commit part entirely.
3A branch name always points to the tip of its own branch, but that branch may be just a part of another branch. For instance, suppose you have a nice linear sequence of commits:
where C7 has C6 as its parent, C6 has C5, and so on. If branch label
X
is a reference to commitC5
, then branch X ends at C5. If branch labelY
points to C7, branch Y ends at C7. In this case branch Y "contains" branch X, but not vice versa.