I would like to put a Git project on GitHub but it contains certain files with sensitive data (usernames and passwords, like /config/deploy.rb for capistrano).
I know I can add these filenames to .gitignore, but this would not remove their history within Git.
I also don't want to start over again by deleting the /.git directory.
Is there a way to remove all traces of a particular file in your Git history?
To be clear: The accepted answer is correct. Try it first. However, it may be unnecessarily complex for some use cases, particularly if you encounter obnoxious errors such as 'fatal: bad revision --prune-empty', or really don't care about the history of your repo.
An alternative would be:
This will of course remove all commit history branches, and issues from both your github repo, and your local git repo. If this is unacceptable you will have to use an alternate approach.
Call this the nuclear option.
Here is my solution in windows
make sure that the path is correct otherwise it won't work
I hope it helps
So, It looks something like this:
For all practical purposes, the first thing you should be worried about is CHANGING YOUR PASSWORDS! It's not clear from your question whether your git repository is entirely local or whether you have a remote repository elsewhere yet; if it is remote and not secured from others you have a problem. If anyone has cloned that repository before you fix this, they'll have a copy of your passwords on their local machine, and there's no way you can force them to update to your "fixed" version with it gone from history. The only safe thing you can do is change your password to something else everywhere you've used it.
With that out of the way, here's how to fix it. GitHub answered exactly that question as an FAQ:
Note for Windows users: use double quotes (") instead of singles in this command
Keep in mind that once you've pushed this code to a remote repository like GitHub and others have cloned that remote repository, you're now in a situation where you're rewriting history. When others try pull down your latest changes after this, they'll get a message indicating that the the changes can't be applied because it's not a fast-forward.
To fix this, they'll have to either delete their existing repository and re-clone it, or follow the instructions under "RECOVERING FROM UPSTREAM REBASE" in the git-rebase manpage.
In the future, if you accidentally commit some changes with sensitive information but you notice before pushing to a remote repository, there are some easier fixes. If you last commit is the one to add the sensitive information, you can simply remove the sensitive information, then run:
That will amend the previous commit with any new changes you've made, including entire file removals done with a
git rm
. If the changes are further back in history but still not pushed to a remote repository, you can do an interactive rebase:That opens an editor with the commits you've made since your last common ancestor with the remote repository. Change "pick" to "edit" on any lines representing a commit with sensitive information, and save and quit. Git will walk through the changes, and leave you at a spot where you can:
For each change with sensitive information. Eventually, you'll end up back on your branch, and you can safely push the new changes.
If you have already pushed to GitHub, the data is compromised even if you force push it away one second later because:
GitHub keeps dangling commits for a long time.
GitHub staff does have the power to delete such dangling commits if you contact them however, which is what you should do: How to remove a dangling commit from GitHub?
Dangling commits can be seen either through:
One convenient way to get the source at that commit then is to use the download zip method, which can accept any reference, e.g.: https://github.com/cirosantilli/myrepo/archive/SHA.zip
It is possible to fetch the missing SHAs either by:
type": "PushEvent"
. E.g. mine: https://api.github.com/users/cirosantilli/events/public (Wayback machine)There are scrappers like http://ghtorrent.org/ and https://www.githubarchive.org/ that regularly pool GitHub data and store it elsewhere.
I could not find if they scrape the actual commit diff, but it is technically possible.
To test this out, I have created a repo: https://github.com/cirosantilli/test-dangling and done:
If you delete the repository however, commits do disappear even from the API immediately and give 404, e.g. https://api.github.com/repos/cirosantilli/test-dangling-delete/commits/8c08448b5fbf0f891696819f3b2b2d653f7a3824 This works even if you recreate another repository with the same name.
So my recommended course of action is:
change your credentials
if that is not enough (e.g. naked pics):
I've had to do this a few times to-date. Note that this only works on 1 file at a time.
Get a list of all commits that modified a file. The one at the bottom will the the first commit:
git log --pretty=oneline --branches -- pathToFile
To remove the file from history use the first commit sha1 and the path to file from the previous command, and fill them into this command:
git filter-branch --index-filter 'git rm --cached --ignore-unmatch <path-to-file>' -- <sha1-where-the-file-was-first-added>..