My repository has large number of large files.
They are mostly data (text).
Sometimes, I need to move these files to another location due to refactoring or packaging.
I use git mv
command to "rename" the path of the files, but it seems inefficient in that the size of the commit (the actual diff size) is very huge, same as rm
, git add
Is there other ways to reduce the commit size?
or should I just add them to .gitignore
and upload as a zip file to upstream?
Thank you for the answers.
FYI,
following series of commands will result the size of the file bar
git mv foo bar
git commit -m "modify"
git cat-file -s HEAD:bar
from which I thought git did rm
and add
.
Would you tell me if this info is not related to the actual size or not?
By design, if you move a file inside a Git repository without changing content, creating a commit will only store new metadata (a.k.a. tree objects) to represent new file location.
Since content is unchanged, Git doesn't need to create new blob object to store file content.
So "commit size" should be rather small.
Since you say that diff size is huge, I suppose that some file content is modified along with relocation. This would be a reason for "commit size" to be huge.
In both case, you can try to shrink .git directory size with the command git gc --prune --aggressive
EDIT :
git mv foo bar
git commit -m "modify"
git cat-file -s HEAD:bar
These commands create a new commit, but the since the foo/bar file content has not changed, Git won't store anything new but the new file name. In fact, in you example, git cat-file -s HEAD:foo
before rename and git cat-file -s HEAD:bar
after will give you the same result, since its the same content (same blob in .git/objects).
I think you are mis-interpreting things that git does internally. Have a look to Git objets to get further explanations.
Remember that git tracks content, not files.
Moving things around in git does not change the size of the repository. Each file is stored exactly once in the repository. You will only increase the size of the repository if you start to change those huge files. - Then each new version is stored separately.
Have a look at git-annex, maybe that is the right thing for you.