If I create a Git repository and publish it publicly (e.g. on GitHub etc.), and I get a request from a contributor to the repository to remove or obscure their name for whatever reason, is there a way of doing so easily?
Basically, I have had such a request and may want to replace their name and e-mail address with something like "Anonymous Contributor" or maybe a SHA-1 hash of their e-mail address or something like that.
Jeff is quite right, the right track is git filter-branch. It expects a script that plays with the environment variables. For your use case, you probably want something like this:
git filter-branch --env-filter '
if [ "$GIT_AUTHOR_NAME" = "Niko Schwarz" ]; then \
export GIT_AUTHOR_NAME="Jon Doe" GIT_AUTHOR_EMAIL="john@bugmenot.com"; \
fi
'
You can test that it works like this:
$ cd /tmp
$ mkdir filter-branch && cd filter-branch
$ git init
Initialized empty Git repository in /private/tmp/filter-branch/.git/
$
$ touch hi && git add . && git commit -m bla
[master (root-commit) 081f7f5] bla
0 files changed, 0 insertions(+), 0 deletions(-)
create mode 100644 hi
$ echo howdi >> hi && git commit -a -m bla
[master a466a18] bla
1 files changed, 1 insertions(+), 0 deletions(-)
$ git log
commit a466a18e4dc48908f7ba52f8a373dab49a6cfee4
Author: Niko Schwarz <niko.schwarz@gmail.com>
Date: Thu Aug 12 09:43:44 2010 +0200
bla
commit 081f7f50921edc703b55c04654218fe95d09dc3c
Author: Niko Schwarz <niko.schwarz@gmail.com>
Date: Thu Aug 12 09:43:34 2010 +0200
bla
$
$ git filter-branch --env-filter '
> if [ "$GIT_AUTHOR_NAME" = "Niko Schwarz" ]; then \
> export GIT_AUTHOR_NAME="Jon Doe" GIT_AUTHOR_EMAIL="john@bugmenot.com"; \
> fi
> '
Rewrite a466a18e4dc48908f7ba52f8a373dab49a6cfee4 (2/2)
Ref 'refs/heads/master' was rewritten
$ git log
commit 5f0dfc0dc9a325a3f3aaf4575369f15b0fb21fe9
Author: Jon Doe <john@bugmenot.com>
Date: Thu Aug 12 09:43:44 2010 +0200
bla
commit 3cf865fa0a43d2343b4fb6c679c12fc23f7c6015
Author: Jon Doe <john@bugmenot.com>
Date: Thu Aug 12 09:43:34 2010 +0200
bla
Please beware. There's no way to delete the author's name without invalidating all later commit hashes. That will make later merging a pain for people that have been using your repository.
If you ever have to "anonymize" a git repo not just for one user, but all users, Git 2.2 (November 2014) provides an interesting feature with the improved and enhanced git fast-export
:
See commit a872275 and commit 75d3d65 by Jeff King (peff
):
teach fast-export
an --anonymize
option:
Sometimes users want to report a bug they experience on their repository, but they are not at liberty to share the contents of the repository.
It would be useful if they could produce a repository that has a similar shape to its history and tree, but without leaking any information.
This "anonymized" repository could then be shared with developers (assuming it still replicates the original problem).
This patch implements an "--anonymize
" option to fast-export
, which generates a stream that can recreate such a repository.
Producing a single stream makes it easy for the caller to verify that they are not leaking any useful information. You can get an overview of what will be shared by running a command like:
git fast-export --anonymize --all |
perl -pe 's/\d+/X/g' |
sort -u |
less
which will show every unique line we generate, modulo any numbers (each anonymized token is assigned a number, like "User 0
", and we replace it consistently in the output).
In addition to anonymizing, this produces test cases that are relatively small (compared to the original repository) and fast to generate (compared to using filter-branch
, or modifying the output of fast-export
yourself)
Doc:
If the --anonymize
option is given, git will attempt to remove all identifying information from the repository while still retaining enough of the original tree and history patterns to reproduce some bugs.
With this option, git will replace all refnames, paths, blob contents, commit and tag messages, names, and email addresses in the output with anonymized data.
Two instances of the same string will be replaced equivalently (e.g., two commits with the same author will have the same anonymized author in the output, but bear no resemblance to the original author string).
The relationship between commits, branches, and tags is +retained, as well as the commit timestamps (but the commit messages and refnames bear no resemblance to the originals).
The relative makeup of the tree is retained (e.g., if you have a root tree with 10 files and 3 trees, so will the output), but their names and the contents of the files will be replaced.
You can make the change in your local repository, git commit --amend
the appropriate commit (where you added the name), and then git push --force
to update github with your version of the repository.
The original commit with the contributor's name will still be available in the reflog (until it expires, but it would take a lot of effort to find it. If this is a concern, you can obliterate that specific commit from the reflog too -- see git help reflog
for the syntax and how to find it in the list.
If you want to change more than one commit, check out the man page for
git filter-branch --env-filter
You can use git-filter-branch to change the content/meta of previous commits.
Note that since you're not dealing with a local branch (it's already been pushed to github), you have no way to remove the author from anyone who has already cloned your branch.
It's also generally bad practice to modify a branch which has already been published, since it can lead to confusion for people who are tracking the branch.