how to read index diff --git output

2019-09-03 02:57发布

问题:

I have a patch looks like

diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/x    c/xc.c
 15 index e220f68..e611b24 100644
 16 --- a/tools/python/xen/lowlevel/xc/xc.c
 17 +++ b/tools/python/xen/lowlevel/xc/xc.c
 18 @@ -228,6 +228,7 @@ static PyObject *pyxc_vcpu_setaffinity(XcObject *self,
 19      int vcpu = 0, i;
 20      xc_cpumap_t cpumap;
 21      PyObject *cpulist = NULL;

And I want to know which commit generates the patch, and how to parse 15 index e220f68..e611b24 100644 in the patch?

回答1:

Let's take a look at output from git show. (This is actual output from a real repo, although I'll snip most bits.)

$ git show d362e62
commit d362e62490dd7f59c170a0a050a203fa0eda9f5a
[snip]
diff --git a/fmt.py b/fmt.py
index c44c267..ba772ee 100755
[snip]

Here, d362e62 is the "short version" of the true name of the commit, i.e., its SHA-1. The "long" form is the full 40-character version, which is the first line of git show output.

Besides the commit text, the commit itself contains a "tree" (and zero or more "parents"). We can see this with git cat-file -p:

$ git cat-file -p d362e62
tree 0b9bebfee8890b242875af0df209fd9f335bf14d
parent 41f3a6bcba1f5f7059133f862727809f49ff4657
[snip author, committer, and commit text]

We can look at the "tree" as well. I could use the "true name" SHA-1 above, but here I use a bit of git syntax: a commit identifier followed by ^{tree} tells git to extract the tree ID from the commit ID.

$ git cat-file -p d362e62^{tree}
[snip]
120000 blob 7417b50d02819bbebeac0f4104850549935f7089    fmt
100755 blob ba772eeb6139de5a724d67d18ce01bfccaf57590    fmt.py
[snip]

I left in the line for fmt as it is a symlink to fmt.py. The symlink has mode 120000, which tells git that the blob data is actually the target of the symlink. The file, fmt.py, has mode 100755, which tells git that it's an ordinary file and that it is executable (it's a Python script). This is the source of the 100644 or 100755 you see in the index line.

The "true name" of the blob (file object) in the git repo is that 40-character SHA-1. The 7-character abbreviated version for fmt.py is ba772ee. This is the second number in the two ..-separated numbers on the index line.

The first number on that line is the "true name" in the git repo of the previous version of the file, i.e., the version of fmt.py that was in the repo before I created commit d362e62.

We can use another bit of special git syntax to see these as well.1 As documented in gitrevisions, following a commit-specifier with a hat character (circumflex, up-arrow, whatever you like to call it) ^ tells git to find the first parent of that commit. So:

$ git rev-parse d362e62^
41f3a6bcba1f5f7059133f862727809f49ff4657

tells us that the commit before the commit I gave to git show is the one named 41f3a6b.... And, sure enough, if we git cat-file -p that, we get another commit with another tree, and if we git cat-file that tree-ID and look for fmt.py we will find another blob with another SHA-1:

$ git cat-file -p 41f3a6b
tree cbfb63beec96eebf0c73ba6a501cc8151adfec8a
parent 80eeb496ea3f538aa14acdc6b0815024a5e99c7e
[snip]
$ git cat-file -p cbfb63beec96eebf0c73ba6a501cc8151adfec8a | grep fmt.py
100755 blob c44c267c4603838ac7a54aa450b33d0dd7a8bebc    fmt.py
$ 

And there it is: cc4c267 is the abbreviated form of the "true name" of the file stored in the previous commit. This is the first number in the index line.

I wrote this all out in long form to illustrate how git gets from "point A" to "point B". But, just as with the short-hand syntax d362e62^{tree}, there is a very easy way to get the blob SHA-1 values using git rev-parse:

$ git rev-parse d362e62:fmt.py
ba772eeb6139de5a724d67d18ce01bfccaf57590
$ git rev-parse d362e62^:fmt.py
c44c267c4603838ac7a54aa450b33d0dd7a8bebc

If you want the shortened versions, use git rev-parse --short to truncate the SHA-1 values to (normally) 7 characters.

So:

And I want to know which commit generates the patch, and how to parse 15 index e220f68..e611b24 100644 in the patch?

The 15 is a line number you (or someone somewhere) added, and now you know what the rest of the values on the index line are. But to find the commit—well, that's the hard part. The commit is what finds the other values. There is no link from "other values" back to "commit": the "arrows", as it were, only point from commits to trees, and then from trees to blobs. There are no pointers from blobs to trees, nor from trees to commits.

Git always starts with some sort of externally specified name. Usually this is a branch name or tag, or a "symbolic reference" (as HEAD normally is, when you don't have a "detached head"). The reference locates a commit.2 If the reference is a branch name, that commit is the "tip" of that branch.3 If it's a tag, it still finds a commit. If it's HEAD, and HEAD is the name of a branch like master, git just turns HEAD into master and then turns master into a commit. In other words, the commit is where you start, usually by going from name to commit-ID—but you can almost always specify a "raw" SHA-1 ID here.

Once git has a commit-ID, that commit identifies more commits (its parents) and a tree. The tree identifies sub-trees if needed, and the tree and its sub-trees identify blobs. Starting from all the commits that have "external names", git eventually finds all trees and all blobs—and any trees or blobs in the repository that are not found this way are eligible for garbage-collection, when you run git gc (or when git gc runs automatically). (This is how deleted branches, and any number of special temporary files that git creates internally, are cleaned-up later.)


1Git has a lot of special syntax. The most useful ones to memorize, in my opinion:

  • hat after thing = parent: master^ = "parent of master"
  • tilde and number N after thing = back up N parents: master~2 = "grandparent of master"
  • X..Y = "all revisions selected by Y, excluding all revisions selected by X": git log master..devel = "log all commits on branch devel that are not on master"

The .. syntax is also used in git diff, but here instead of "stuff on Y that's not on X", you get a direct comparison of the version associated with X against the version associated with Y.

2I'm deliberately skipping over "annotated tags", which also have repository entities. In some cases git will access the tag object, and in others—when it needs a commit, tree, and/or blob—git will automatically follow the annotated tag. Internally, an annotated tag looks very similar to a commit, except that instead of a tree and parents, it has a reference to another git repository object—usually directly to a commit, but sometimes to another tag, and in theory you can make an annotated tag for a tree or a blob, skipping over the commit part entirely.

3A branch name always points to the tip of its own branch, but that branch may be just a part of another branch. For instance, suppose you have a nice linear sequence of commits:

...<-- C3 <-- C4 <-- C5 <-- C6 <-- C7

where C7 has C6 as its parent, C6 has C5, and so on. If branch label X is a reference to commit C5, then branch X ends at C5. If branch label Y points to C7, branch Y ends at C7. In this case branch Y "contains" branch X, but not vice versa.



标签: git diff patch