Get changes between a commit and its parent with l

2019-03-27 02:01发布

问题:

I am working with libgit2sharp (a C# wrapper for libgit2) and have been running into issues because it doesnt have a lot of the functionality I am hoping for (hopefully I can contribute to it soon; this seems like a really useful project)

The thing I am trying to do right now is get a list of the files changed from a particular commit and its parent. I will not try to figure out what has changed between a merge and its two parents. I am more interested in regular commits.

These guys (https://github.com/libgit2/libgit2sharp/issues/89) are working on something similar. I think their procedure is a sound idea but I am a little weak on my understanding of GIT internals (there are loads of guides on an end user's guide to GIT but not so much on internal structure)

I am curious how GIT itself does a "git diff" command. Supposedly GIT doesnt actually store deltas but rather a full version of the file (if it is unchanged it will just point to an existing SHA. This information can be found from various sources such as here http://xentac.net/2012/01/19/the-real-difference-between-git-and-mercurial.html). This seems to make it harder to get the changes between two commits (in my case a particular commit and its single parent) because the data isnt stored as part of the commit (which is clear if you examine the Commit class in libgit2sharp's Commit.cs file).

What I can get access to from a commit is the tree. Would it make sense to do the following to find this information:

1) Start at the desired commit and walk down the tree and store all the SHA values in a set.

2) Start at the parent for the desired commit and walk down its tree to store all its blob SHA values in another set.

3) The SHA's for the files changed will be the files that are not in the intersection of the two sets.

The problem I see with this approach is that it doesnt look like there is a way to get the filename from the blob's SHA value (I dont see anything that can do this in libgit2sharp's Blob.cs file).

I know there are a lot of facets to this question but they are part of this large goal to get a particular piece of data from git.

Thanks.

回答1:

What you're after, a tree diffing feature, already exists in libgit2 as defined in tree.h header.

The git_tree_diff() function compares two Trees and invokes a callback for every difference (addition, updation and deletion). The callback function is being passed a git_tree_diff_data structure with the filepath of the considered blob, its status, the former and current filemodes and the former and current SHAs.

From LibGit2Sharp perspective, it would make more sense to leverage existing libgit2 feature rather than re-implementing them in C#. However, even if you can get some inspiration from existing Interop definitions, things tend to become quickly tricky when trying to tame .Net/native interop layer.

From your perspective (as contributing to LibGit2Sharp may not your primary goal ;)), another option would be to port the C code to C#, relying on LibGit2Sharp existing features to walk down the trees. git_tree_diff() (and its satellite functions) is a very clean piece of code, and although it does quite a complex job, the comments are very clear and helpful.

References:

  • The git_tree_diff() function is implemented in src/tree.c
  • Tests exercising this feature are available here

Note: In order to bind git_tree_diff(), an issue should be opened in libgit2 tracker requesting that the method definition should be updated in order to be GIT_EXTERN'd. Otherwise it won't be accessible from .Net.

UPDATE

Release v0.9.0 of LibGit2Sharp eventually brought the Tree to Tree diffing feature.

TreeChanges changes = repo.Diff.Compare(fromTree, newTree);

Exposed properties are:

  • Added/Modified lines
  • Collections of TreeEntry changes per kind of change (eg. Added, Modified, ...)
  • The diff Patch

You can find more about this feature and how to leverage the TreeChanges by taking a look at the unit tests in DiffTreeToTreeFixture.cs.