Several times, I have come across the statement that, if you move a single function from one file to another file, Git can track it. For example, this entry says, "Linus says that if you move a function from one file to another, Git will tell you the history of that single function across the move."
But I have a little bit of awareness of some of Git's under-the-hood design, and I don't see how this is possible. So I'm wondering ... is this is a correct statement? And if so, how is this possible?
My understanding is that Git stores each file's contents as a Blob, and each Blob has a globally unique identity which arises from the SHA hash of its contents and size. Git then represents folders as Trees. Any filename information belongs to the Tree, not to the Blob, so a file rename for example shows up as a change to a Tree, not to a Blob.
So if I have a file called "foo" with 20 functions in it, and a file called "bar" with 5 functions in it, and I move one of the functions from foo into bar (resulting in 19 and 6, respectively), how can Git detect that I moved that function from one file to another?
From my understanding, this would cause 2 new blobs to exist (one for the modified foo and one for the modified bar). I realize a diff could be calculated to show that the function was moved from one file to the other. But I don't see how history about the function could possibly become associated with bar instead of foo (not automatically, anyway).
If Git were to actually look inside of single files, and compute a blob per function (which would be crazy / infeasible, because you'd have to know how to parse any possible language), then I could see how this might be possible.
So ... is the statement correct or not? And if it is correct, then what is lacking in my understanding?
This functionality is provided through git blame -C
The -C option drives git into trying to find matches between addition or deletion of chunks of text in the file being reviewed and the files modified in the same changesets. Additional -CC, or -CCC extend the search. Type git help blame for the manual page.
Try for yourself in a test repo with git blame -C and you'll see that the block of code that you just moved is originated in the original file where it belonged to.
As of Git 2.15, git diff
now supports detection of moved lines with the --color-moved
option. It works for moves across files.
It works, obviously, for colorized terminal output. As far as I can tell, there is no option to indicate moves in plain text patch format, but that makes sense.
For default behavior, try
git diff --color-moved
The command also takes options, which currently are no
, default
, plain
, zebra
and dimmed_zebra
(Use git help diff
to get the latest options and their descriptions). For example:
git diff --color-moved=zebra
As to how it is done, you can glean some understanding from this email exchange by the author of the functionality.
A bit of this functionality is in git gui blame
(+ filename). It shows an annotation of the lines of a file, each indicating when it was created and when last changed. For code movement across a file, it shows the commit of the original file as a creation, and the commit where it was added to the current file as last change. Try it.
What I really would want is to give git log
as some argument a line number range additionally to a file path, and then it would show the history of this code block. There is no such option, if the documentation is right. Yes, from Linus' statement I too would think such a command should be readily available.
git doesn't actually track renames at all. A rename is just a delete and add, that's all. Any tools who show renames reconstruct them from this history information.
As such, tracking function renames is a simple matter of analyzing the diffs of all files in each commit after the fact. There's nothing particularly impossible about it; the existing rename tracking already handles 'fuzzy' renames, in which some changes are done to the file as well as renaming it; this requires looking at the contents to the files. It would be a simple extension to look for function renames as well.
I don't know if the base git tools actually do this however - they try to be language neutral, and function identification is very much not language neutral.
There's git diff
that will show you that certain lines disappeared from foo
and reappeared in bar
. If there are no other changes in these files in the same commit, the change will be easy to spot.
An intellectual git
client would be able to show you how lines moved from one file to another. A language-aware IDE would be able to correspond this change with a particular function.
A very similar thing happens when a file gets renamed. It just disappears under one name and reappears under another, but any reasonable tool is able to notice it and represent as a rename.