How to push history edited with 'git replace&#

2020-07-24 04:58发布

问题:

I have a remote with a history that looks like this:

As you can see O and P are merge commits and both of them closed their old branch so now there's only one branch.

I want to squash C-D-E-G-J-K-L-N into one commit and F-H-I-M into an other commit because they are just tiny commits cluttering the history.

Locally I managed to squash C-D-E-G-J-K-L-N using the method described in the answer by John O'M. to this question, like this:

git checkout -b squashing-1 N
git reset --soft C~
git commit -m "Squashed history"
git replace N [ID_of_the_commit_i_just_made]

and this works, locally git log from main-branch correctly reports Q, P, O, X, M, I, etc. (X is the new squashed commit).

From here the next steps would be to (1) check out the main branch and merge in the changes, (2) delete the temporary local branch, then (3) push the changes to the remote repo. But (1) and (3) report Already up to date or Everything is up to date since there are no actual changes to tree which is exactly the point of all this.

I've tried using git push --force origin main-branch and git push --force-with-lease origin main-branch too but i got the same result: Everything is up to date.


How can I correctly merge in these history changes and push them to BitBucket without having to re-create the entire repo?

回答1:

You essentially have a choice to make: do you wish to make everyone use the replacement references, or do you prefer to rewrite the entire repository and make everyone have a big flag-day during which they switch from "old repository" to "new repository"? Neither method is particularly fun or profitable. :-)

How replacements work

What git replace does is to add a new object into the Git repository and give it a name in the refs/replace/ name-space. The name in this name-space is the hash ID of the object that the new object replaces. For instance, if you're replacing commit 1234567..., the name of the new object (whose ID is not 1234567...—for concreteness, let's say it's fedcba9... instead) is refs/replace/1234567....

The rest of Git, when looking for objects, checks first to see if there is a refs/replace/<hash-id> object. If so (and replacing is not disabled), the rest of Git then returns the object to which the refs/replace/ name points, instead of the actual object. So when some other part of Git reads some commit that says "my parent commit is 1234567...", that other part of Git goes to find 1234567..., sees that refs/replace/1234567... exists, and returns object fedcba9... instead. You then see the replacement.

If you do not have the reference refs/replace/1234567..., though, your Git never swaps in the replacement object. (This is true whether or not you have the replacement object. It's the reference itself that causes the replacement to occur. Having the reference guarantees that you have the object.)

Hence, for some other Git to execute this same replacement process, you must deliver the refs/replace/ reference to that other Git.

Transferring replacements from one Git to another

In general, you would push such objects with:

git push <repository> 'refs/replace/*:refs/replace/*'

(or specifically list the one replace reference you wish to push). To fetch these objects:

git fetch <repository> 'refs/replace/*:refs/replace/*'

(You can add this fetch refspec to the fetch configuration in each clone. Using git fetch or git fetch <repository> will then automatically pick up any new replacement objects pushed. Pushing is still a pain, and of course this step has to be repeated on each new clone.)

Note that neither refspec here sets the force flag. It's up to you whether you want to force-overwrite existing refs/replace/ references, should such a thing happen.

Rewriting a repository

Alternatively, once you have replacements in place, you can run a repository-copying operation—by this, I mean a commit-by-commit copy, not a fast copy like git clone --mirror—such as git filter-branch. If this copying operation is run without disabling replacements, the replaced objects are not copied; instead, their replacements are copied. Hence:

git filter-branch --tag-name-filter cat -- --all

has the side effect of "cementing replacements" forever in the copied repository. You may then discard all the original references and all the replacement references. (The easy way to do this is to clone the filtered repository.)

Of course, since this is a new and different repository, it is not compatible with the original repository or any of its clones. But it no longer requires careful coordination of the refs/replace/ name-space (since it no longer has any replacement objects!).



回答2:

From here the next steps would be to (1) check out the main branch and merge in the changes, (2) delete the temporary local branch, then (3) push the changes to the remote repo.

It seems you misunderstand what git replace really did. There is nothing to merge, because the true history isn't changed in any way by git replace. Rather, replace makes a note off to the side that says "by default, when browsing the history, if you find this object, substitute this one instead". You actually can still see the real history, e.g. git --no-replace-objects log.

So replace creates the illusion of a rewritten history. In that it isn't a true rewrite and therefore doesn't create an "upstream rebase" situation for other developers, this is pretty cool. OTOH it cannot be trusted as a way to scrub sensitive data from the repo, since the rewrite really is just an illusion. And the output you get from git commands can be misleading, in that it can imply that the "real object" SHA ID is associated with the "replacement object" content (when in fact it's essentially certain that said content would not hash to said SHA).

What you really need to do if you decide to go ahead and share the replacement with origin is

git push origin refs/replace/*

Be aware that there are a few known bugs/quirks, and the documentation suggests that there may be unknown bugs/quirks.



回答3:

Note: you will need to make sure the server allows it: a new configuration variable core.usereplacerefs has been added with Git 2.19 (Q3 2018), primarily to help server installations that want to ignore the replace mechanism altogether.

See commit da4398d, commit 6ebd1ca, commit 72470aa (18 Jul 2018) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 1689c22, 15 Aug 2018)

add core.usereplacerefs config option

We can already disable replace refs using a command line option or environment variable, but those are awkward to apply universally. Let's add a config option to do the same thing.

That raises the question of why one might want to do so universally. The answer is that replace refs violate the immutability of objects. For instance, if you wanted to cache the diff between commit XYZ and its parent, then in theory that never changes; the hash XYZ represents the total state.
But replace refs violate that; pushing up a new ref may create a completely new diff.

The obvious "if it hurts, don't do it" answer is not to create replace refs if you're doing this kind of caching.
But for a site hosting arbitrary repositories, they may want to allow users to share replace refs with each other, but not actually respect them on the site (because the caching is more important than the replace feature).