I wrote an update hook (server-side) which checks all commit messages (To check if there is an Issue Id)
There is an extract of my python code (update.py):
[...]
if newrev == "0000000000000000000000000000000000000000":
newrev_type = "delete"
elif oldrev == "0000000000000000000000000000000000000000":
newrev_type = "create"
# HERE IS MY QUESTION, I want to get the commits SHA-1 :-)
else:
POPnewrev_type = os.popen("git cat-file -t " + newrev)
newrev_type = POPnewrev_type.read()[0:-1]
# get the SHA-1
POPanalyzelog = os.popen("git log " + oldrev + ".." + newrev + " --pretty=#%H")
analyzelog = POPanalyzelog.read().split('#')
[...]
So, here, in case of newrev_type = "delete", the user wants to delete a branch => No problem.
In case of pushing in an existing branch, we get the SHA-1 of the commits => OK
But when the user creates a branch, I don't know how to get the SHA-1...
Do you have any ideas?
Before I answer, let's note some reminders. There are several "stumbling blocks" that get people when they write hooks. You have hit my "third" in the list below.
In both pre-receive and update, you are given three arguments (in different orders and through different methods, arguments vs stdin; but the same three arguments, in the end, with the same "deal" as it were). Two are old and new sha1 and the third is the reference name. Let's call them
oldrev
andnewrev
(as you did) and the thirdrefname
.When you finish your script, a return value of
0
allows git to updaterefname
, and a nonzero return forbids it. That is, the script is called with a proposal: "I (the git update operation now running) propose to make a change in some label(s)". For the update hook you get each label individually, and each return value allows or disallows one change; for the pre-receive hook, you get them all in a batch, one per line, on standard input, and your return value allows or disallows the change as a whole. (If you reject the change in pre-receive, no updates will happen. After pre-receive OKs them, or is absent, update gets a chance one at a time.)If the
refname
starts with"refs/heads/"
, it is a branch name. Other possibilities include"refs/tags/" and "refs/notes/"
although note references are relatively new. Most refnames will point to commit objects, except that tags often (but not always) point to annotated-tag objects.So here's the first stumbling block: the refname might not be a branch. Make sure that it's OK to apply your logic to tags (and maybe notes), or handle them separately (whichever is appropriate).
If the old and new sha1 are both "non-null" (not
"0" * 40
), the proposal is to move the label. It used to nameoldrev
and now it will (if you allow it) namenewrev
.Here's the second stumbling block: when a label moves, there's no guarantee that the old revision and new revision are related at all. Watch out for "nonsense" results from
oldrev..newrev
, which occur in that case. You may (or may not, depending on what you're doing) want to verify thatoldrev
is an ancestor ofnewrev
. (Seegit merge-base --is-ancestor
.)When the new sha1 is null, the proposal is to remove the label, which is pretty straightforward (everyone seems to get this right instinctively :-) ).
When the old sha1 is null, the proposal is to set a new label. Here's the third stumbling block: That label did not exist before. That tells you nothing about which commit(s), if any, you want to consider to be "part of" the new label. Labels only name one commit and it's up to someone interpreting them, at some future point, what that label "means".
As an example, suppose I have a copy of your repo (I did a
git clone
earlier) and am allowed togit push
back to it. I decide: gosh, rev 1234567 should have a tag, and ref 5555555 should have a branch label:If 1234567 refers to a commit object, I have created a new lightweight tag pointing to that; if it's an annotated tag, I've made a name (probably "another" name) for the annotated tag.
Assuming
5555555
refers to a commit object, I have in fact created a new branch, but what is its "history"? In this case, it probably has none at all, I probably just added the label "in the middle of" some existing branch. (But maybe not: maybe I added it where mymaster
now points, and I am going to rewindmaster
back toorigin/master
in a moment, after mypush
finishes.)The most common answer seems to be "the new branch names any commit starting from newrev but not already named by, or through the parents of, any other branch-name". There is a way to find a list of such commits. In
sh
form (see notes below):In this case, since you're in a pre-receive or update hook, the new refname has not actually been born yet, so it should not be necessary to exclude it, but a comment on this answer suggests that sometimes it might, in which case (again in sh):
would do the trick. But there's another potential stumbling block here, which you can't do anything about in an update hook: if a push is creating more than one branch, the resulting list could depend on the multiple new branch names and/or the order of their creation. In a post-receive hook, you can find all new branch creations, and:
--not
arguments togit rev-list
as needed.If you do the latter, beware of the case of creating two or more new branch labels at the same revision: they'll each refer to all of the others' commits.
A final stumbling block (rarely hit): in the post-receive hook, the input stream listing revision numbers and reference names is coming from a pipe, and can only be read once. If you want to read it multiple times, you must save it to a temporary file (so that you can seek back to offset 0, or close and re-open it).
A few final notes:
I'd recommend doing:NULL_SHA1 = "0" * 40
earlier in the python code, and then using
rev == NULL_SHA1
as the test. If nothing else, it makes it easy to see that there are exactly forty0
s, and that the point is to check for a "null sha1".Git may move to using SHA3-256, now that SHA-1 has been broken by example. (This is not fatal to Git, but shows that compute power has advanced to the point that it's perhaps unwise to keep depending on it.) It's not clear how this will affect hooks, but you might now want to match against any number of
0
s as long as they are all zeros, using:(or
re.search('^0+$', ...)
if you preferre.search
for some reason). You can pre-compile this asnullhash = re.compile('^0+$')
and then usenullhash.match
ornullhash.search
(as before, the prefix hat is only required if you are using the generalsearch
rather than the left-anchoredmatch
).Use
subprocess.Popen
withshell=False
for a little bit more efficiency (save firing up "sh") and safety (not a problem with refnames, seegit check-ref-format
, but just a general rule).Use
git rev-list
directly, rather thanlog
with format%H
(and study the manual page forgit rev-list
closely; it's highly relevant to most hooks).Leave in the
refs/heads/
and/orrefs/tags/
prefixes:git rev-list
is happy with these prefixes, and they serve to make sure that you get the right reference. For instance, if there are both a tag and a branch namedmaster
, which one do you get? (You get the tag—but why not use the full name, and not have to remember that?)