So I come from a centralized VCS background and am trying to nail down our workflow in Git (new company, young code base). One question I can't find a simple yet detailed answer to is what exactly does rebase on a remote branch do. I understand it rewrites the history, and in general should be limited to local branches only.
The workflow I'm currently trying to vet out involves a remote collaboration branch, each dev "owning" one for the purpose of sharing code. (Having 2 developers and max 3 in the foreseeable future a feature branch for each project & feature request seems excessive and more overhead than benefit gained.)
Then I came across this answer and tried it and it accomplished what I'd like - a dev commits and pushes often to his own collab branch, when he knows what is approved to be released to staging he can rebase remotely (to squash and perhaps reorganize) before merging into develop.
Enter the original question - if the remote branch is for the purpose of collaboration someone else is bound to pull it sooner or later. If it is a process/training issue to not have the 'guest developer' commit to that collab branch, what actually happens with the branch owner rebases that remote branch?
The main problem with rebasing (or rewriting the history) of the published (remote) branches is that it becomes difficult to reintegrate work based on them. So if those remotes are fetched for review only and no commit, even a merge one, is ever made on top of those you won't generally have many issues. Otherwise merging and resolving conflicts might soon become major annoyance.
It's not really evil, it's a matter of implementations and expectations.
We start with a tangle of facts:
Every Git hash represents some unique object. For our purposes here we need only consider commit objects. Each hash is the result of applying a cryptographic hash function (for Git, specifically, it's SHA-1) to the contents of the object. For a commit, the contents include the ID of the source tree; the name and email address and time/date-stamp of the author and committer; the commit message; and most crucially here, the ID of the parent commit.
Changing even just a single bit in the content results in a new, very-different hash ID. The cryptographic properties of the hash function, which serve to authenticate and verify each commit (or other object), also mean that there is no way to have some different object have the same hash ID. Git counts on this for transferring objects between repositories, too.
Rebase works (necessarily) by copying commits to new commits. Even if nothing else changes—and usually, the source code associated with the new copies differs from the original source code—the whole point of the rebase is to re-parent some commit chain. For instance, we might start with:
where branch
feature
separates from branchdevelop
at commit*
, but now we would likefeature
to descend from the tip commit ofdevelop
, so we rebase it. The result is:where the two
@
s are copies of the original two commits.Branch names, like
develop
, are just pointers pointing to a (single) commit. The things we tend think of as "a branch", like the two commits@--@
, are formed by working backwards from each commit to its parent(s).Branches are always expected to grow new commits. It's perfectly normal to find that
develop
ormaster
has some new commits added on, so that the name now points to a commit—or the last of many commits—that points back to where the name used to point.Whenever you get your Git to synchronize (to whatever degree) your repository with some other Git and its other repository, your Git and their Git have an exchange of IDs—specifically, hash IDs. Exactly which IDs depends on the direction of the transfer, and any branch names you ask your Git to use.
A remote-tracking branch is actually an entity that your Git stores, associated with your repository. Your remote-tracking branch
origin/master
is, in effect, your Git's place to remember "what the Git atorigin
said hismaster
was, the last time we talked."So, now we take these seven items, and look at how
git fetch
works. You might rungit fetch origin
, for instance. At this point, your Git calls up the Git onorigin
and asks it about its branches. They say things likemaster = 1234567
andbranch = 89abcde
(though the hash values are all exactly 40 characters long, rather than these 7-character ones).Your Git may already have these commit objects. If so, we are nearly done! If not, it asks their Git to send those commit objects, and also any additional objects your Git needs to make sense of them. The additional objects are any files that go with those commits, and any parent commit(s) those commits use that you do not already have, plus the parents' parents, and so on, until we get to some commit object(s) that you do have. This gets you all the commits and files you need for any and all new history.1
Once your Git has all the objects safely stored away, your Git then updates your remote-tracking branches with the new IDs. Their Git just told you that their
master
is1234567
, so now yourorigin/master
is set to1234567
. The same goes for theirbranch
: it becomes yourorigin/branch
and your Git saves the89abcde
hash.If you now
git checkout branch
, your Git usesorigin/branch
to make a new local label, pointing to89abcde
. Let's draw this:(I've shortened
1234567
to just1
here, and89abcde
to just8
, to get them to fit better.)To make things really interesting, let's make our own new commit on
branch
, too. Let's say it gets numberedaaaaaaa...
:(I shortened
aaaaaaa...
to justA
).The interesting question, then, is what happens if they—the Git from which you fetch—rebase something. Suppose, for instance, that they rebase
branch
ontomaster
. This copies some number of commits. Now you rungit fetch
and your Git sees that they saybranch = fedcba9
. Your Git checks to see if you have this object; if not, you get it (and its files) and its parent (and that commit's files) and so on until we reach some common point—which will, in fact, be commit1234567
.Now you have this:
Here I've written
F
for commitfedcba9
, the oneorigin/branch
now points-to.If you come across this later without realizing that the upstream guys rebased their
branch
(yourorigin/branch
), you might look at this and think that you must have written all three commits in theo--8--A
chain, because they're on yourbranch
and not onorigin/branch
anymore. But the reason they're not onorigin/branch
is that the upstream abandoned them in favor of the new copies. It's a bit hard to tell that those new copies are, in fact, copies, and that you, too, should abandon those commits.1If branches grow in the "normal", "expected" way, it's really easy for your Git and their Git to figure out which commits your Git needs from them: your
origin/master
tells you where you saw theirmaster
last time, and now theirmaster
points further down a longer chain. The commits you need are precisely those on theirmaster
that come after the tip of yourorigin/master
.If branches are shuffled around in less-typical ways, it's somewhat harder. In the most general case, they simply have to enumerate all their objects by hash IDs, until your Git tells them that they have reached one you already have. The specific details get further complicated by shallow clones.
It's not impossible
It's not impossible to tell, and since Git version 2.0 or so, there are now built-in tools to let Git figure it out for you. (Specifically,
git merge-base --fork-point
, which is invoked bygit rebase --fork-point
, uses your reflog fororigin/branch
to figure out that theo--8
chain used to be onorigin/branch
at one point. This only works for the time-period that those reflog entries are retained, but this defaults to at least 30 days, giving you a month to catch up. That's 30 days in your time-line: 30 days from the time you rungit fetch
, regardless of how long ago the upstream did the rebase.)What this really boils down to is that if you and your upstream agree, in advance, that some particular set of branch(es) get rebased, you can arrange to do whatever is required in your repository every time they do this. With a more typical development process, though, you won't expect them to rebase, and if they don't—if they never "abandon" a published commit that you have fetched—then there's nothing you need to recover from.