Git: How can I find a commit that most closely mat

2019-01-10 18:56发布

问题:

Someone took a version (unknown to me) of Moodle, applied many changes within a directory, and released it (tree here).

How can I determine which commit of the original project was most likely edited to form this tree?

this would allow me to form a branch at the appropriate commit with this patch. Surely it came from either the 1.8 or 1.9 branches, probably from a release tag, but diffing between particular commits doesn't help me much.

Postmortem Update: knittl's answer got me as close as I'm going to get. I first added my patch repo as the remote "foreign" (no commits in common, that's OK), then did diffs in loops with a couple format options. The first used the --shortstat format:

for REV in $(git rev-list v1.9.0^..v1.9.5); do 
    git diff --shortstat "$REV" f7f7ad53c8839b8ea4e7 -- mod/assignment >> ~/rdiffs.txt; 
    echo "$REV" >> ~/rdiffs.txt; 
done;

The second just counted the line changes in a unified diff with no context:

for REV in $(git rev-list v1.9.0^..v1.9.5); do 
    git diff -U0 "$REV" f7f7ad53c8839b8ea4e7 -- mod/assignment | wc -l >> ~/rdiffs2.txt;
    echo "$REV" >> ~/rdiffs2.txt; 
done;

There were thousands of commits to dig through, but this one seems to be the closest match.

回答1:

you could write a script, which diffs the given tree against a revision range in your repository.

assume we first fetch the changed tree (without history) into our own repository:

git remote add foreign git://…
git fetch foreign

we then output the diffstat (in short form) for each revision we want to match against:

for REV in $(git rev-list 1.8^..1.9); do
   git diff --shortstat foreign/master $REV;
done

look for the commit with the smallest amount of changes (or use some sorting mechanism)



回答2:

This was my solution:

#!/bin/sh

start_date="2012-03-01"
end_date="2012-06-01"
needle_ref="aaa"

echo "" > /tmp/script.out;
shas=$(git log --oneline --all --after="$start_date" --until="$end_date" | cut -d' ' -f 1)
for sha in $shas
do
    wc=$(git diff --name-only "$needle_ref" "$sha" | wc -l)
    wc=$(printf %04d $wc);
    echo "$wc $sha" >> /tmp/script.out
done
cat /tmp/script.out | grep -v ^$ | sort | head -5


回答3:

How about using git to create a patch from all versions of 1.8. and 1.9 to this new release. Then you could see which patch makes more 'sense'.

For example, if the patch 'removes' many methods, then it is probably not this release, but one before. If the patch has many sections that don't make sense as a single edit, then it probably isn't this release either.

And so on... In reality, unfortunately, there doesn't exist an algorithm to do this perfectly. I will have to be some heuristic.



回答4:

How about using 'git blame'? It will show you, for each line, who changed it, and in which revision.