Moving large number of large files in git reposito

2019-02-27 04:59发布

问题:

My repository has large number of large files. They are mostly data (text). Sometimes, I need to move these files to another location due to refactoring or packaging.

I use git mv command to "rename" the path of the files, but it seems inefficient in that the size of the commit (the actual diff size) is very huge, same as rm, git add

Is there other ways to reduce the commit size? or should I just add them to .gitignore and upload as a zip file to upstream?


Thank you for the answers.

FYI, following series of commands will result the size of the file bar

git mv foo bar
git commit -m "modify"
git cat-file -s HEAD:bar

from which I thought git did rm and add. Would you tell me if this info is not related to the actual size or not?

回答1:

By design, if you move a file inside a Git repository without changing content, creating a commit will only store new metadata (a.k.a. tree objects) to represent new file location. Since content is unchanged, Git doesn't need to create new blob object to store file content. So "commit size" should be rather small.

Since you say that diff size is huge, I suppose that some file content is modified along with relocation. This would be a reason for "commit size" to be huge.

In both case, you can try to shrink .git directory size with the command git gc --prune --aggressive

EDIT :

git mv foo bar
git commit -m "modify"
git cat-file -s HEAD:bar

These commands create a new commit, but the since the foo/bar file content has not changed, Git won't store anything new but the new file name. In fact, in you example, git cat-file -s HEAD:foo before rename and git cat-file -s HEAD:bar after will give you the same result, since its the same content (same blob in .git/objects). I think you are mis-interpreting things that git does internally. Have a look to Git objets to get further explanations.

Remember that git tracks content, not files.



回答2:

Moving things around in git does not change the size of the repository. Each file is stored exactly once in the repository. You will only increase the size of the repository if you start to change those huge files. - Then each new version is stored separately.

Have a look at git-annex, maybe that is the right thing for you.