Someone took a version (unknown to me) of Moodle, applied many changes within a directory, and released it (tree here).
How can I determine which commit of the original project was most likely edited to form this tree?
this would allow me to form a branch at the appropriate commit with this patch. Surely it came from either the 1.8 or 1.9 branches, probably from a release tag, but diffing between particular commits doesn't help me much.
Postmortem Update: knittl's answer got me as close as I'm going to get. I first added my patch repo as the remote "foreign" (no commits in common, that's OK), then did diffs in loops with a couple format options. The first used the --shortstat
format:
for REV in $(git rev-list v1.9.0^..v1.9.5); do
git diff --shortstat "$REV" f7f7ad53c8839b8ea4e7 -- mod/assignment >> ~/rdiffs.txt;
echo "$REV" >> ~/rdiffs.txt;
done;
The second just counted the line changes in a unified diff with no context:
for REV in $(git rev-list v1.9.0^..v1.9.5); do
git diff -U0 "$REV" f7f7ad53c8839b8ea4e7 -- mod/assignment | wc -l >> ~/rdiffs2.txt;
echo "$REV" >> ~/rdiffs2.txt;
done;
There were thousands of commits to dig through, but this one seems to be the closest match.
you could write a script, which diffs the given tree against a revision range in your repository.
assume we first fetch the changed tree (without history) into our own repository:
git remote add foreign git://…
git fetch foreign
we then output the diffstat (in short form) for each revision we want to match against:
for REV in $(git rev-list 1.8^..1.9); do
git diff --shortstat foreign/master $REV;
done
look for the commit with the smallest amount of changes (or use some sorting mechanism)
This was my solution:
#!/bin/sh
start_date="2012-03-01"
end_date="2012-06-01"
needle_ref="aaa"
echo "" > /tmp/script.out;
shas=$(git log --oneline --all --after="$start_date" --until="$end_date" | cut -d' ' -f 1)
for sha in $shas
do
wc=$(git diff --name-only "$needle_ref" "$sha" | wc -l)
wc=$(printf %04d $wc);
echo "$wc $sha" >> /tmp/script.out
done
cat /tmp/script.out | grep -v ^$ | sort | head -5
How about using git to create a patch from all versions of 1.8. and 1.9 to this new release.
Then you could see which patch makes more 'sense'.
For example, if the patch 'removes' many methods, then it is probably not this release, but one before. If the patch has many sections that don't make sense as a single edit, then it probably isn't this release either.
And so on... In reality, unfortunately, there doesn't exist an algorithm to do this perfectly. I will have to be some heuristic.
How about using 'git blame'? It will show you, for each line, who changed it, and in which revision.