I have a 2 year old repository that started off as essentially a private repository, so it contained in it's history at different points: key files, encryption keys, large sets of images in various places, etc etc, in the history of the repository, all of which are no longer tracked but still exist in the history.
The source is now becoming shared, since we're bringing on new developers, and I want to make a clean start with a mostly clean repository. However, during this transitional period, I may have to deal with the old repository as well, sharing patches/commits between the two repositories.
What is the best way to break away from the previous history in git and yet retain backwards compatibility the ability to share commits between the old repository and the new clean repository, as cleanly as possible?
Objectives:
- Make sensitive commits in the way past of the history unavailable in the new repository.
- Allow full functionality in the new repository (clone, push, fetch, everything that's normal for git)
- Maximize the ability for the old repo to recognize patches/commits that come from the new repo
- [Less important] Make new repo faster due to not having binaries in ancient commits that aren't present in working copy.
Use git branch to create a new branch. Then git rebase your first to last commit to have a history-less version of your old branch.
You can always go back to the old branch to get commit specific details.
Just creating a new, clear branch in your existing repo won't help: If the users could read this branch, they'll also have access to your old branches that contain your sensitive information. To overcome this, you'll have to create a new repo with no (or only limited) knowledge about the past.
To achieve this, I'd do the following:
Take a relatively new state of your repo (e.g. the last labeled version or something like that, say
V1.0
) and use this as a start to create a new repo (newrepo
) that is used by your new developers.Then, on your machine, add a remote called
oldrepo
that points to the old repository holding the old sensitive data.Next, take all commits from
V1.0
until latest fromoldrepo
andcherry-pick
them into your new repo. At this point, your new repo has the same state asoldrepo
without the dirty history.Now, clone a bare repo from
newrepo
(newrepo.git
). All your developers clonenewrepo.git
and work on it.If it comes to take patches etc. from
newrepo.git
intooldrepo
or vice versa, this operation will be done by you, i.e. your colleagues send you needed patches generated byformat-patch
and youam
them into the old repo. If you have some fixes done inoldrepo
, you could againcherry-pick
them intonewrepo.git
and make them available to your devs.This limits the access to
oldrepo
to you and your colleagues will never see any sensitive data.It depends what you specifically mean by "backwards compatibility" but you should be able to specify a
--depth 1
argument togit clone
(as described in the git-clone man page) and get something where patches can be shared (although commits themselves won't be able to be shared via the normal push/pull mechanism you might be used to.What you could try (I don’t know if this works) is to create a separate branch that tracks the new development, which starts off the empty commit (i.e. has no parent) and just got the content copied in. Then you merge that branch back into the old master (by hand). After that you should be able to develop on the new branch and pull in changes from it to the old branch. And you don’t have to publish the old branch to others.
As and image it would like this, where
O
is the original branch,C
is copy in a new parentless branch,M
being the manual merge, andm
being subsequent merges.