We have a Git repo with 7 contributing developers with over 2.5 years of history and about 10,000 commits. We use Assembla to push and pull from. When we add new developers cloning the repo to their dev computers takes almost an hour.
I'm not sure if this is the proper terminology, but our goal is to "shrink" the repo by "cutting/snipping" off the first 1.5 years worth of commits and keeping only the latest year of history. We want to keep a "backup" copy of the entire repo, whether as a separate repo or maybe a branch? We'd like to repeat this in the future an possible merge the initial split with the new split when needed, but I'm not sure if this is possible. If there is a way to have all the history on a separate branch and just keep the master
branch with only the history for the last year, that would be great, but let me know of possible pros and cons.
I don't know of all the possibilities/options we have which is why I am here. I read something about patches
but I'm not sure if that truly is what I need or if there is something better/easier. What are you guys doing to take care of an issue like this, including pros and cons? Keep in mind, I still need every developer to continue to push and pull, preferably staying on the master
branch.
Thanks in advance!
Disclaimer: You should do this with a test repo first, since the commands listed here can destroy your data.
You can edit (i.e. let a script edit) the history via the fast-export mechanism. The first step is to run
git fast-export --signed-tags=strip --no-data --full-tree --export-marks=export.marks branch1 branch2 [...] branchN > commits.fi
. Now you have a fast-import stream of all your branches. In this stream, you can drop a commit by deleting the lines from commit refs/... to the trailing newline. You need also to remove thefrom :<mark>
line (and alsomerge :<mark>
lines if your new history starts with a merge) from your then new "first" commits.To aid this process, you should look into the revision graph, and take revisions where no merges crosses their history. In the following graph, A, D and F are good candidates to start with. B or C do have the problem, that the successor D does also depend on G and A.
With the
export.marks
file you can translate the commit ID to mark numers in the stream.You need to give your branches new names, since git fast-import won't accept the new history, because the new branches don't contain commits from the existing ones.
After you created the new history, you need to import it with
git fastimport < manipulated-history.fi
into your existing repo.Now it is time to check if the import was correct. For this you need to clone the repo into a temporary one, and in this temporary repo you create for each newly created commit a graft so that it has the same parent revisions like before. Afterwards you run
git filter-branch newBranch1 newBranch2 [...] newBranchN
. The import was correct if each newBranch now stays at the same commit as the corresponding branch.When everything worked so far, you can create a new repo and pull the newBranch-branches from the first working clone int it, and make it the new working repo. Also note that you should not leave any repos with grafts anywhere, since grafts can cause harm.
The best step-by-step instructions can be found on the Git SCM blog post "Replace Kicker".
The short summary is this:
git branch history
hash.git commit-tree
.history
commits onto your new base.master
branch up to the server.git replace
to re-connect the history together.The original post explains it much better with pictures.
When dealing with complex histories involving merges, this may not work well, depending on how well
git rebase --onto
works with--preserve-merges
. You should obviously test well before proceeding.