I want to automatically push commits in the post-receive hook from a central repo on our LAN to another central repo in the cloud. The LAN repo is created using git clone --mirror git@cloud:/path/to/repo
or equivalent commands.
Because the files being committed will be large relative to our upstream bandwidth, it's entirely possible something like this could happen:
- Alice initiates a push to the LAN repo.
- Bill pulls from the LAN repo while the post-receive hook is running.
- The LAN repo is in the middle of pushing to the cloud repo.
- This also means Bill's local repo contains the commits Alice pushed. Confirmed through testing.
- Bill initiates a push to the LAN repo.
- Bill's push is a fast-forward of Alice's push, so the LAN repo will accept it.
When the post-receive hook for the LAN repo executes, a second push from the LAN repo to the cloud repo will start and the two will run concurrently.
I'm not worried about the git objects. The worst-case scenario is that both pushes upload all of the objects from Alice's push, but that shouldn't matter as far as I understand git's internals.
I'm concerned about the refs. Suppose Alice pushed using a much slower connection, so that Bill's push finishes first. Suppose packet loss or something else causes the hook's push from the LAN repo to the cloud of Bill's push to finish before the hook's push from the LAN repo to the cloud of Alice's push. If both Alice and Bill are pushing the master branch and Bill's push finishes first, What will the master ref be on the cloud repo? I want it to be Bill's HEAD, since that's the later push, but I'm concerned it will be Alice's HEAD.
Further clarification:
I realize Alice's push from her machine to the LAN repo will fail if Bill's push from his machine to the LAN repo finishes first. In that case, the LAN repo's post-receive hook will not execute. Furthermore, please assume nobody will be doing force pushes, so if the post-receive hook runs on the LAN repo, all ref changes are fast-forwards.
Git 2.4+ (Q2 2015) will introduce atomic pushes, which should make easier for the server to manage the pushes order.
See the work done by Stefan Beller (
stefanbeller
):commit d0e8e09
push.c
: add an--atomic
argument--atomic
command line argumentreceive-pack.c
: negotiate atomic push supportIf Bill's push finishes first Alice's push will fail because before the refs are updated git makes sure the ref for the repo is still the same one as before. In this scenario it will not be. Alice will end up seeing the error message and needs to resolve the issues. The same goes for Bill in the vice versa case. So in your post-receive hook you must make sure that the original and new refs for the repo are different now. If not, then do not push up to the new repo at all to save some work.
I still see a problem in your scenario though and it is with the push to the cloud. You can have the SAME issue with the hook pushing two valid refs up to the cloud location. Except now you wont know if you need to push to the repo in the script if it fails the first time because you won't know if the failed ref was older or newer than the one pushed... especially if they weren't simple fast forwards which can happen from time to time. If you just forced the push no matter what that would have a chance the cloud will have an OLD ref until another hook pushes something else up later. In the case with Alice he would have merged the changes from upstream or any number of other solutions, but the script probably shouldn't have such decision making capability.
In the hook you might be able to do some script magic on the current repo to determine timestamps and the like and only push if there is a fast forward, but that seems messy and it is more likely a merge is needed anyway. I think a better solution than using a post-receive hook is to use a cron, or scheduled, task every five minutes (or however frequent you want) that simply runs a git pull on the master branch of your remote mirror. If you don't have access to that repo, you can do the force push from your LAN repo with a cron job instead. I think this is safer than the hook and less complicated. This will assure you the branch on the backup cloud is always in the correct place every few minutes and doesn't risk pushing an older ref and never getting the newest one until there is another push from a user, like the hook does.