Are concurrent git pushes always safe if the secon

2019-02-08 19:14发布

问题:

I want to automatically push commits in the post-receive hook from a central repo on our LAN to another central repo in the cloud. The LAN repo is created using git clone --mirror git@cloud:/path/to/repo or equivalent commands.

Because the files being committed will be large relative to our upstream bandwidth, it's entirely possible something like this could happen:

  1. Alice initiates a push to the LAN repo.
  2. Bill pulls from the LAN repo while the post-receive hook is running.
    • The LAN repo is in the middle of pushing to the cloud repo.
    • This also means Bill's local repo contains the commits Alice pushed. Confirmed through testing.
  3. Bill initiates a push to the LAN repo.
    • Bill's push is a fast-forward of Alice's push, so the LAN repo will accept it.

When the post-receive hook for the LAN repo executes, a second push from the LAN repo to the cloud repo will start and the two will run concurrently.

I'm not worried about the git objects. The worst-case scenario is that both pushes upload all of the objects from Alice's push, but that shouldn't matter as far as I understand git's internals.

I'm concerned about the refs. Suppose Alice pushed using a much slower connection, so that Bill's push finishes first. Suppose packet loss or something else causes the hook's push from the LAN repo to the cloud of Bill's push to finish before the hook's push from the LAN repo to the cloud of Alice's push. If both Alice and Bill are pushing the master branch and Bill's push finishes first, What will the master ref be on the cloud repo? I want it to be Bill's HEAD, since that's the later push, but I'm concerned it will be Alice's HEAD.

Further clarification:

I realize Alice's push from her machine to the LAN repo will fail if Bill's push from his machine to the LAN repo finishes first. In that case, the LAN repo's post-receive hook will not execute. Furthermore, please assume nobody will be doing force pushes, so if the post-receive hook runs on the LAN repo, all ref changes are fast-forwards.

回答1:

If Bill's push finishes first Alice's push will fail because before the refs are updated git makes sure the ref for the repo is still the same one as before. In this scenario it will not be. Alice will end up seeing the error message and needs to resolve the issues. The same goes for Bill in the vice versa case. So in your post-receive hook you must make sure that the original and new refs for the repo are different now. If not, then do not push up to the new repo at all to save some work.

I still see a problem in your scenario though and it is with the push to the cloud. You can have the SAME issue with the hook pushing two valid refs up to the cloud location. Except now you wont know if you need to push to the repo in the script if it fails the first time because you won't know if the failed ref was older or newer than the one pushed... especially if they weren't simple fast forwards which can happen from time to time. If you just forced the push no matter what that would have a chance the cloud will have an OLD ref until another hook pushes something else up later. In the case with Alice he would have merged the changes from upstream or any number of other solutions, but the script probably shouldn't have such decision making capability.

In the hook you might be able to do some script magic on the current repo to determine timestamps and the like and only push if there is a fast forward, but that seems messy and it is more likely a merge is needed anyway. I think a better solution than using a post-receive hook is to use a cron, or scheduled, task every five minutes (or however frequent you want) that simply runs a git pull on the master branch of your remote mirror. If you don't have access to that repo, you can do the force push from your LAN repo with a cron job instead. I think this is safer than the hook and less complicated. This will assure you the branch on the backup cloud is always in the correct place every few minutes and doesn't risk pushing an older ref and never getting the newest one until there is another push from a user, like the hook does.



回答2:

Git 2.4+ (Q2 2015) will introduce atomic pushes, which should make easier for the server to manage the pushes order.
See the work done by Stefan Beller (stefanbeller):

  • commit ad35eca t5543-atomic-push.sh: add basic tests for atomic pushes

This adds tests for the atomic push option.
The first four tests check if the atomic option works in good conditions and the last three patches check if the atomic option prevents any change to be pushed if just one ref cannot be updated.

  • commit d0e8e09 push.c: add an --atomic argument

    --[no-]atomic
    

Use an atomic transaction on the remote side if available.
Either all refs are updated, or on error, no refs are updated.
If the server does not support atomic pushes the push will fail.

  • commit 4ff17f1: send-pack.c: add --atomic command line argument

This adds support to send-pack to negotiate and use atomic pushes iff the server supports it. Atomic pushes are activated by a new command line flag --atomic.

  • commit 1b70fe5: receive-pack.c: negotiate atomic push support

This adds the atomic protocol option to allow receive-pack to inform the client that it has atomic push capability.
This commit makes the functionality introduced in the previous commits go live for the serving side.
The changes in documentation reflect the protocol capabilities of the server.

   atomic
   ------

If the server sends the 'atomic' capability it is capable of accepting atomic pushes.
If the pushing client requests this capability, the server will update the refs in one atomic transaction.
Either all refs are updated or none.