How to Shrink/Cut a Git Repo

2019-02-09 20:44发布

We have a Git repo with 7 contributing developers with over 2.5 years of history and about 10,000 commits. We use Assembla to push and pull from. When we add new developers cloning the repo to their dev computers takes almost an hour.

I'm not sure if this is the proper terminology, but our goal is to "shrink" the repo by "cutting/snipping" off the first 1.5 years worth of commits and keeping only the latest year of history. We want to keep a "backup" copy of the entire repo, whether as a separate repo or maybe a branch? We'd like to repeat this in the future an possible merge the initial split with the new split when needed, but I'm not sure if this is possible. If there is a way to have all the history on a separate branch and just keep the master branch with only the history for the last year, that would be great, but let me know of possible pros and cons.

I don't know of all the possibilities/options we have which is why I am here. I read something about patches but I'm not sure if that truly is what I need or if there is something better/easier. What are you guys doing to take care of an issue like this, including pros and cons? Keep in mind, I still need every developer to continue to push and pull, preferably staying on the master branch.

Thanks in advance!

标签: git branch
2条回答
时光不老,我们不散
2楼-- · 2019-02-09 20:51

Disclaimer: You should do this with a test repo first, since the commands listed here can destroy your data.

You can edit (i.e. let a script edit) the history via the fast-export mechanism. The first step is to run git fast-export --signed-tags=strip --no-data --full-tree --export-marks=export.marks branch1 branch2 [...] branchN > commits.fi. Now you have a fast-import stream of all your branches. In this stream, you can drop a commit by deleting the lines from commit refs/... to the trailing newline. You need also to remove the from :<mark> line (and also merge :<mark> lines if your new history starts with a merge) from your then new "first" commits.

To aid this process, you should look into the revision graph, and take revisions where no merges crosses their history. In the following graph, A, D and F are good candidates to start with. B or C do have the problem, that the successor D does also depend on G and A.

A ---- B ---- C ---- D --- E ---- F
  \                /  \     \   /
   \--- G ------- H    \--I--J-/

With the export.marks file you can translate the commit ID to mark numers in the stream.

You need to give your branches new names, since git fast-import won't accept the new history, because the new branches don't contain commits from the existing ones.

After you created the new history, you need to import it with git fastimport < manipulated-history.fi into your existing repo.

Now it is time to check if the import was correct. For this you need to clone the repo into a temporary one, and in this temporary repo you create for each newly created commit a graft so that it has the same parent revisions like before. Afterwards you run git filter-branch newBranch1 newBranch2 [...] newBranchN. The import was correct if each newBranch now stays at the same commit as the corresponding branch.

When everything worked so far, you can create a new repo and pull the newBranch-branches from the first working clone int it, and make it the new working repo. Also note that you should not leave any repos with grafts anywhere, since grafts can cause harm.

查看更多
太酷不给撩
3楼-- · 2019-02-09 20:56

The best step-by-step instructions can be found on the Git SCM blog post "Replace Kicker".

The short summary is this:

  • Create a new branch that is at the point where you want to cut, say git branch history hash.
  • Push the history to a new repository.
  • Create a new base using git commit-tree.
  • Rebase your post-history commits onto your new base.
  • Push your new truncated master branch up to the server.
  • People can then use git replace to re-connect the history together.

The original post explains it much better with pictures.

When dealing with complex histories involving merges, this may not work well, depending on how well git rebase --onto works with --preserve-merges. You should obviously test well before proceeding.

查看更多
登录 后发表回答