I want to convert to git an existing codebase which has big binary library files in it. The library files are external (vendor) dependencies. These binaries are only needed to link the final application. The size of these binaries is huge (2.2 Gig), so in order to reduce the main repo size (and not have to unduly grow the main repo size), I would like to host the binaries in a git repo and use a submodule to reference only the latest version of the library binaries.
I can setup correctly the shallow subrepo, but I do not know how to update to the latest version if the binary repo (with full history) changes.
The repo structure I have is similar to this:
main_project
sub_binary
other project files
...
here are the commands that allowed me to have a shallow submodule:
cd main_project
git submodule add --depth 1 file://remote_binary_repo_path sub_binary
This works and the sub_binary is pinned to the correct revision.
How do I update the shallow submodule sub_binary
(and record this in the main_repo) to the latest version (and only the latest revision) if the remote library repo gets updated?
Notes:
- if I do a git log in sub_binary in the initial submodule setup, I get the expected history of one commit.
- when I try to do a
git pull --depth 1
insub_binary
, I get a merge error: Automatic merge failed; fix conflicts and then commit the result. - I am using git 1.8.4
- I have read VonC's answer to Git Shallow Submodules, but it does not mention how to update such a submodule.
Edit:
I have been able to update the submodule after a lot of git learning (see my own answer). But there is still the issue that the main repo grows as new versions are fetched.
For a test, I have a binary file, 2 meg in size and I clone shallowly to create a submodule.
du -h
at initial clone after a git submodule update --init --depth 1
:
40K ./.git/hooks
4.0K ./.git/info
4.0K ./.git/logs/refs/heads
4.0K ./.git/logs/refs/remotes/origin
4.0K ./.git/logs/refs/remotes
8.0K ./.git/logs/refs
12K ./.git/logs
40K ./.git/modules/sub_binary/hooks
4.0K ./.git/modules/sub_binary/info
4.0K ./.git/modules/sub_binary/logs/refs/heads
4.0K ./.git/modules/sub_binary/logs/refs/remotes/origin
4.0K ./.git/modules/sub_binary/logs/refs/remotes
8.0K ./.git/modules/sub_binary/logs/refs
12K ./.git/modules/sub_binary/logs
0B ./.git/modules/sub_binary/objects/info
2.0M ./.git/modules/sub_binary/objects/pack
2.0M ./.git/modules/sub_binary/objects
4.0K ./.git/modules/sub_binary/refs/heads
4.0K ./.git/modules/sub_binary/refs/remotes/origin
4.0K ./.git/modules/sub_binary/refs/remotes
0B ./.git/modules/sub_binary/refs/tags
8.0K ./.git/modules/sub_binary/refs
2.1M ./.git/modules/sub_binary
2.1M ./.git/modules
4.0K ./.git/objects/70
4.0K ./.git/objects/de
4.0K ./.git/objects/info
8.0K ./.git/objects/pack
20K ./.git/objects
4.0K ./.git/refs/heads
4.0K ./.git/refs/remotes/origin
4.0K ./.git/refs/remotes
0B ./.git/refs/tags
8.0K ./.git/refs
2.2M ./.git
2.0M ./sub_binary
4.2M .
du -h
after two or three update cycles:
40K ./.git/hooks
8.0K ./.git/info
4.0K ./.git/logs/refs/heads
4.0K ./.git/logs/refs
8.0K ./.git/logs
40K ./.git/modules/sub_binary/hooks
8.0K ./.git/modules/sub_binary/info
0B ./.git/modules/sub_binary/logs/refs/heads
8.0K ./.git/modules/sub_binary/logs/refs/remotes/origin
8.0K ./.git/modules/sub_binary/logs/refs/remotes
8.0K ./.git/modules/sub_binary/logs/refs
12K ./.git/modules/sub_binary/logs
4.0K ./.git/modules/sub_binary/objects/0a
4.0K ./.git/modules/sub_binary/objects/1b
2.0M ./.git/modules/sub_binary/objects/a0
4.0K ./.git/modules/sub_binary/objects/info
4.0M ./.git/modules/sub_binary/objects/pack
6.0M ./.git/modules/sub_binary/objects
0B ./.git/modules/sub_binary/refs/heads
8.0K ./.git/modules/sub_binary/refs/remotes/origin
8.0K ./.git/modules/sub_binary/refs/remotes
0B ./.git/modules/sub_binary/refs/tags
8.0K ./.git/modules/sub_binary/refs
6.1M ./.git/modules/sub_binary
6.1M ./.git/modules
4.0K ./.git/objects/70
4.0K ./.git/objects/de
4.0K ./.git/objects/info
8.0K ./.git/objects/pack
20K ./.git/objects
4.0K ./.git/refs/heads
0B ./.git/refs/tags
4.0K ./.git/refs
6.2M ./.git
2.0M ./sub_binary
8.2M .
Since I fetch shallowly and reset, I would think that the repo would contain only one copy of the files + the working dir which would be around 4 megs.