I’m executing the following command:
git log --name-only –pretty="format:%H %s" -- *.sql --grep="JIRA-154"
which returns results in the format:
[commitid1] [comment]
path/to/file1/file1.sql
path/to/file2/file2.sql
path/to/file3/file3.sql
[commitid2] [comment]
path/to/file2/file2.sql
path/to/file4/file4.sql
The output is redirected to a file and the format is exactly what I’m looking for, however merge commits are a problem. The files that have been changed as part of a merge are never listed. Instead I end up with something like the following:
[commitid3] [merge comment]
[commitid4] [comment]
path/to/file3/file3.sql
I’ve obviously misunderstood something here because I expect to see the files that changed during the merge listed. Is there a way to include these files in the output?
TL;DR
Try adding the -m
option to the git log
options. This makes Git "split" each merge, so that it will diff the merge twice, once against each parent. Without this or some other similar option, git log
finds the merges but then does not even look inside them at all.
Also, as ElpieKay commented, you need to put the --grep=<regexp>
before the --
. It may also be a good idea to write "*.sql"
, i.e., with quotes, to prevent your shell from expanding the asterisk itself (the details vary from one shell to another and depend on whether there are any *.sql
files in your current working directory).
Long version
As Tim Biegeleisen said, the problem stems from the nature of a merge commit.
Normally, to show you what changed in a commit, Git runs a simple git diff parent self
, where parent
and self
are the commit's parent, and the commit itself, respectively. Both git log
and git show
do this, in slightly different ways and under slightly different circumstances. The most obvious is that git show
defaults to showing a diff every time, but git log
only does a diff when given -p
or one of the various diff control options such as --name-only
.
Merges are different
A merge commit is a commit with two1 parents. This means that git log
and git show
would have to run two git diff
commands.2 And in fact, git show
does run two diffs, but then—by default—turns them into a combined diff, which shows only those files whose merge-commit version differs from both parents. But for whatever reason,3 git log
does not do this by default.
Even when git log
is showing diffs, though, it behaves particularly oddly (I might even say badly) on merges. While git log -p
or git log --name-status
runs a (single) diff on a regular commit, it does not run the diff at all on a commit with multiple visible parents, unless you force it to.
Using -m
by itself always works. This flag essentially tells git log
(and git show
) to break up a merge into multiple separate "virtual commits". That is, if commit M is a merge with parents P1 and P2, then—for the purpose of the diff at least—Git acts as though there was a commit MP1 with parent P1, and a second commit MP2 with parent P2. You get two diffs (and two commit IDs in the diff headers).
Adding --first-parent
tells git log
to ignore the second (and any additional) parent of a merge, which leaves it with just one parent. This means git log
won't follow the side branch at all. Hence you can use -m --first-parent
, provided you're not interested in histories stemming from the other sides of merges. That gets you a single diff against just the first parent, instead of one diff per parent.
(Which parent is first? Well, it's the one that was your HEAD
when you ran git merge
. That's normally the "main line" of commits, i.e., the ones "on your branch". But if your group uses git pull
casually, you probably do not want to ignore the other side of merges, as git pull
turns other people's main-line work into "foxtrot merges" of small side branches.)
Combined diffs, again
Besides -m
, you can supply -c
or --cc
(note that -c
has one dash while --cc
has two4) to git log
to get it to produce a combined diff, just like git show
. But, as with all combined diffs, this ignores files that match up between the merge commit and either parent. That is, given the same merge M again, this time Git compares M vs P1, and M vs P2. For any file F where M:F is the same as either P1:F or P2:F, Git shows nothing at all.
As it turns out, this is usually what you want. If file F in commit M matches file F in one of the two parent commits, that means the file came from that parent. The fact that F in P1 may not match F in P2 is usually not interesting: any change in F in either P1 or P2 is probably a result of some earlier change in history, and that's where we should take note of it, rather than at merge M.
That is the logic behind combined diffs, anyway. It's not applicable in all circumstances, which is why -m
exists: to "split up" the merge into its constituent parts.
1Two or more, actually, but "more" is unusual; most merge commits have exactly two parents. A merge commit with more than two parents is called an octopus merge.
2Both git log
and git show
have most of git diff
built in to them, so that they do not actually have to run additional commands, but it works out the same either way.
3I don't know the reason, and I only learned of this particular behavior when I went through the git log
source, trying to explain why git log --name-status
had not shown something.
4This is because --cc
is a long option, and in GNU option parsing, all long options like name-only
or cc
get two dashes, while all short (one letter) options like p
get one dash.
The reason you are observing this behavior is that in the case of a merge commit there are two sets of changed files, one coming from each parent. One option here would be to use the --first-parent -m
options when running your git log
:
git log --name-only --grep="JIRA-154" --first-parent -m –-pretty="format:%H %s" -- "*.sql"
This will tell Git to focus onlg on the main branch into which the merge is happening, showing the set of files for this commit only.
Check here for the documentation and here for a great blog post.