I have a Git repository which contains a number of subdirectories. Now I have found that one of the subdirectories is unrelated to the other and should be detached to a separate repository.
How can I do this while keeping the history of the files within the subdirectory?
I guess I could make a clone and remove the unwanted parts of each clone, but I suppose this would give me the complete tree when checking out an older revision etc. This might be acceptable, but I would prefer to be able to pretend that the two repositories doesn't have a shared history.
Just to make it clear, I have the following structure:
XYZ/
.git/
XY1/
ABC/
XY2/
But I would like this instead:
XYZ/
.git/
XY1/
XY2/
ABC/
.git/
ABC/
You might need something like "git reflog expire --expire=now --all" before the garbage collection to actually clean the files out. git filter-branch just removes references in the history, but doesn't remove the reflog entries that hold the data. Of course, test this first.
My disk usage dropped dramatically in doing this, though my initial conditions were somewhat different. Perhaps --subdirectory-filter negates this need, but I doubt it.
This is no longer so complex you can just use the git filter-branch command on a clone of you repo to cull the subdirectories you don't want and then push to the new remote.
To add to Paul's answer, I found that to ultimately recover space, I have to push HEAD to a clean repository and that trims down the size of the .git/objects/pack directory.
i.e.
After the gc prune, also do:
Then you can do
and the size of ABC/.git is reduced
Actually, some of the time consuming steps (e.g. git gc) aren't needed with the push to clean repository, i.e.:
Paul's answer creates a new repository containing /ABC, but does not remove /ABC from within /XYZ. The following command will remove /ABC from within /XYZ:
Of course, test it in a 'clone --no-hardlinks' repository first, and follow it with the reset, gc and prune commands Paul lists.
For what it's worth, here is how using GitHub on a Windows machine. Let's say you have a cloned repo in residing in
C:\dir1
. The directory structure looks like this:C:\dir1\dir2\dir3
. Thedir3
directory is the one I want to be a new separate repo.Github:
MyTeam/mynewrepo
Bash Prompt:
$ cd c:/Dir1
$ git filter-branch --prune-empty --subdirectory-filter dir2/dir3 HEAD
Returned:
Ref 'refs/heads/master' was rewritten
(fyi: dir2/dir3 is case sensitive.)$ git remote add some_name git@github.com:MyTeam/mynewrepo.git
git remote add origin etc
. did not work, returned "remote origin already exists
"$ git push --progress some_name master
I’ve found that in order to properly delete the old history from the new repository, you have to do a little more work after the
filter-branch
step.Do the clone and the filter:
Remove every reference to the old history. “origin” was keeping track of your clone, and “original” is where filter-branch saves the old stuff:
Even now, your history might be stuck in a packfile that fsck won’t touch. Tear it to shreds, creating a new packfile and deleting the unused objects:
There is an explanation of this in the manual for filter-branch.