How to update a shallow cloned submodule without i

2020-07-19 03:26发布

问题:

I want to convert to git an existing codebase which has big binary library files in it. The library files are external (vendor) dependencies. These binaries are only needed to link the final application. The size of these binaries is huge (2.2 Gig), so in order to reduce the main repo size (and not have to unduly grow the main repo size), I would like to host the binaries in a git repo and use a submodule to reference only the latest version of the library binaries.

I can setup correctly the shallow subrepo, but I do not know how to update to the latest version if the binary repo (with full history) changes.

The repo structure I have is similar to this:

main_project
    sub_binary
    other project files
    ...

here are the commands that allowed me to have a shallow submodule:

cd main_project
git submodule add --depth 1 file://remote_binary_repo_path sub_binary

This works and the sub_binary is pinned to the correct revision.

How do I update the shallow submodule sub_binary (and record this in the main_repo) to the latest version (and only the latest revision) if the remote library repo gets updated?

Notes:

  • if I do a git log in sub_binary in the initial submodule setup, I get the expected history of one commit.
  • when I try to do a git pull --depth 1 in sub_binary, I get a merge error: Automatic merge failed; fix conflicts and then commit the result.
  • I am using git 1.8.4
  • I have read VonC's answer to Git Shallow Submodules, but it does not mention how to update such a submodule.

Edit:

I have been able to update the submodule after a lot of git learning (see my own answer). But there is still the issue that the main repo grows as new versions are fetched.

For a test, I have a binary file, 2 meg in size and I clone shallowly to create a submodule. du -h at initial clone after a git submodule update --init --depth 1:

 40K    ./.git/hooks
4.0K    ./.git/info
4.0K    ./.git/logs/refs/heads
4.0K    ./.git/logs/refs/remotes/origin
4.0K    ./.git/logs/refs/remotes
8.0K    ./.git/logs/refs
 12K    ./.git/logs
 40K    ./.git/modules/sub_binary/hooks
4.0K    ./.git/modules/sub_binary/info
4.0K    ./.git/modules/sub_binary/logs/refs/heads
4.0K    ./.git/modules/sub_binary/logs/refs/remotes/origin
4.0K    ./.git/modules/sub_binary/logs/refs/remotes
8.0K    ./.git/modules/sub_binary/logs/refs
 12K    ./.git/modules/sub_binary/logs
  0B    ./.git/modules/sub_binary/objects/info
2.0M    ./.git/modules/sub_binary/objects/pack
2.0M    ./.git/modules/sub_binary/objects
4.0K    ./.git/modules/sub_binary/refs/heads
4.0K    ./.git/modules/sub_binary/refs/remotes/origin
4.0K    ./.git/modules/sub_binary/refs/remotes
  0B    ./.git/modules/sub_binary/refs/tags
8.0K    ./.git/modules/sub_binary/refs
2.1M    ./.git/modules/sub_binary
2.1M    ./.git/modules
4.0K    ./.git/objects/70
4.0K    ./.git/objects/de
4.0K    ./.git/objects/info
8.0K    ./.git/objects/pack
 20K    ./.git/objects
4.0K    ./.git/refs/heads
4.0K    ./.git/refs/remotes/origin
4.0K    ./.git/refs/remotes
  0B    ./.git/refs/tags
8.0K    ./.git/refs
2.2M    ./.git
2.0M    ./sub_binary
4.2M    .

du -h after two or three update cycles:

 40K    ./.git/hooks
8.0K    ./.git/info
4.0K    ./.git/logs/refs/heads
4.0K    ./.git/logs/refs
8.0K    ./.git/logs
 40K    ./.git/modules/sub_binary/hooks
8.0K    ./.git/modules/sub_binary/info
  0B    ./.git/modules/sub_binary/logs/refs/heads
8.0K    ./.git/modules/sub_binary/logs/refs/remotes/origin
8.0K    ./.git/modules/sub_binary/logs/refs/remotes
8.0K    ./.git/modules/sub_binary/logs/refs
 12K    ./.git/modules/sub_binary/logs
4.0K    ./.git/modules/sub_binary/objects/0a
4.0K    ./.git/modules/sub_binary/objects/1b
2.0M    ./.git/modules/sub_binary/objects/a0
4.0K    ./.git/modules/sub_binary/objects/info
4.0M    ./.git/modules/sub_binary/objects/pack
6.0M    ./.git/modules/sub_binary/objects
  0B    ./.git/modules/sub_binary/refs/heads
8.0K    ./.git/modules/sub_binary/refs/remotes/origin
8.0K    ./.git/modules/sub_binary/refs/remotes
  0B    ./.git/modules/sub_binary/refs/tags
8.0K    ./.git/modules/sub_binary/refs
6.1M    ./.git/modules/sub_binary
6.1M    ./.git/modules
4.0K    ./.git/objects/70
4.0K    ./.git/objects/de
4.0K    ./.git/objects/info
8.0K    ./.git/objects/pack
 20K    ./.git/objects
4.0K    ./.git/refs/heads
  0B    ./.git/refs/tags
4.0K    ./.git/refs
6.2M    ./.git
2.0M    ./sub_binary
8.2M    .

Since I fetch shallowly and reset, I would think that the repo would contain only one copy of the files + the working dir which would be around 4 megs.

回答1:

In my particular use case, I cannot merge or pull because of binary data. So the solution is quite simple:

cd sub_module
git fetch --depth 1
git reset --hard origin/master
cd ..
git add sub_module
git commit -m 'updated sub_module'


回答2:

Since submodules are almost always in detached head mode then wouldn't this work:

git fetch --depth 1
git checkout sub_binary/master

Edit:

This thread here indicates that git pull should work. Is there a linear history between the head of the remote and the head of the submodule?