I merged an upstream of a large project with my local git repo. Prior to the merge I had a small amount of history that was easy to read through, but after the merge a massive amount of history is now in my repo. I have no need for all the history commits from the upstream repo.
There have been other commits made after this upstream merge that I would like to keep. How do I squash all that history that was merged from the upstream into one commit while keeping the commits made after the upstream merge?
The solution I ended up using was to manually recreate the history. I did this mainly because I didn't want to spend too much time looking for an elegant solution and there wasn't that much history (around 30 commits I'd have to manually merge).
So, I created a branch before I merged the huge upstream:
git checkout -b remove-history-fix <commit ID before merge>
Then re-merged the upstream using the --squash
option.
git merge --squash <upstream tag>
Then manually cherry-picked the commits after the merge from the old branch (the one with the huge amount of upstream history).
git cherry-pick <commit ID>
After all those commits were merged into my remove-history-fix branch, I removed the branch with the upstream history.
git branch -D <upstream-history-branch>
A couple of options for you:
Limit Logging
Not exactly what you asked for, but possibly a good alternative, and a lot easier. This allows you to use git like normal, but hides all the stuff you don't want to see (assuming the issue is the history cluttering up your log and not the raw storage space. I think squashing the merge in your branch won't prevent git from including all the commits from upstream if you fetched the upstream for the merge action in the first place.).
In this case, you would do a normal merge, but when logging you would add --first-parent
to the command.
For example, without the option I might have (assume "sample more" 1 to 3 was actually a lot more commits)
$ git log --oneline
0e151bf Merge remote-tracking branch 'origin/master' into nosquash
f578cbb sample more 3
7bc88cf sample more 2
682b412 sample more 1
fc6e1b3 Merge remote-tracking branch 'origin/master'
29ed293 More stuff
9577f30 my local change
018cb03 Another commit
a5166b1 Initial
But, if I add --first-parent
it cleans up to this:
$ git log --oneline --first-parent
0e151bf Merge remote-tracking branch 'origin/master'
fc6e1b3 Merge remote-tracking branch 'origin/master'
9577f30 my local change
018cb03 Another commit
a5166b1 Initial
Notice all of the commits from the master after I branched ("my local change" being my divergent commit) are gone. Only commits I made show up, including when I merged. If I had used better commit messages during the merge, I might even know what the batch of changes were.
Replace History
This is for what you asked.
Taking inspiration from https://git-scm.com/book/en/v2/Git-Tools-Replace
What we'll do here is squash the remote's history, replace their history with our squashed version from our perspective, and merge the squashed version.
In my example repository, the revisions that upstream added which I hadn't merged yet were 682b412 "sample more 1" to origin/master (f578cbb "sample more 3") (although not that long for this example, pretend there are 50 commits or whatever in between).
The first thing I want is a local branch of the remote side:
git checkout -b squashing origin/master
Next, I want to quickly squash it
git reset --soft 682b412~
git commit -m "Squashed upstream"
Note the tilde ~
character. That causes our branch to be at the parent of the first commit in the range we want to squash, and because we specified --soft
, our index is still at the last commit in the range we want to squash. The commit line results in a single commit that consists of what was our first through last, inclusive.
At this point, the origin/master and squashing branches have identical tree contents but different histories.
Now, we tell git that when it sees references to the original commit of origin/master, to use our squashed commit instead. Using git log
I can see the new "Squashed upstream" commit is 1f0bc14, so we do:
git replace f578cbb 1f0bc14
From here on, your git will use the "squashed upstream" commit.
Back on our original branch (if it was "master")
git checkout master
git merge f578cbb
This appears to merge the origin master (f578cbb), actually gets 1f0bc14's contents, but logs it as having a parent SHA1 of f578cbb
We no longer need the squashing branch, so you can get rid of it.
Now, let's say upstream added more features. In this simple example, on upstream's repo, a log might show this:
84f5044 new feature
f578cbb sample more 3
7bc88cf sample more 2
682b412 sample more 1
29ed293 More stuff
018cb03 Another commit
a5166b1 Initia
After we fetch upstream though, if we look at its log from our repo, we see this instead:
84f5044 new feature
f578cbb squashed upstream
29ed293 More stuff
018cb03 Another commit
a5166b1 Initial
Note how it appears to have squashed history to us as well, and more importantly, the squashed upstream SHA1 is showing the one used in upstream's history (for them it is really the "sample more 3" commit).
So, merging continues to work like normal
git merge origin/master
But we don't have such a cluttered log:
4a9b5b7 Merge remote-tracking branch 'origin/master' for new feature
46843b5 Merge remote-tracking branch 'origin/master'
84f5044 new feature
f578cbb squashed upstream
fc6e1b3 Merge remote-tracking branch 'origin/master'
29ed293 More stuff
9577f30 my local change
018cb03 Another commit
a5166b1 Initial
If the "new feature" commit in upstream was similarly a large number of commits, we could repeat this process to squash that down as well.
I was able to squash several commits after multiple merges from the master branch using the strategy found here: https://stackoverflow.com/a/17141512/1388104
git checkout my-branch # The branch you want to squash
git branch -m my-branch-old # Change the name to something old
git checkout master # Checkout the master branch
git checkout -b my-branch # Create a new branch
git merge --squash my-branch-old # Get all the changes from your old branch
git commit # Create one new commit
You will have to force an update if you need to push your squashed branch to a remote repository that you have previously pushed to, e.g. git push origin my-branch -f
There is no way to do it, as you won't be able to push back or merge again with that remote repository or any other of that same project. When squashing, you are changing history, resulting in different sha1-hashes between your repository and the remote one.
You'll have to live with the large history.