How to shallow pull submodule that is tracked by b

2020-07-26 04:27发布

问题:

Hi I have a superproject that contains a submodule. The submodule is tracked by a branch name and not by a sha commit number. On our buildserver I would like to pull as minimum as possible. So I tried

git submodule update --remote --init 

This however is not shallow. It seems like pulls everything then switches to branch

git submodule update --remote --init --depth 1

This doesnt work, it fails on this:

git submodule update --remote --init --depth 1 ThirdParty/protobuf
Submodule 'ThirdParty/protobuf' (ssh://myrepo/thirdparty/protobuf.git) 
registered for path 'ThirdParty/protobuf'
Cloning into '/home/martin/jenkins/workspace/test_log_service/repo/ThirdParty/protobuf'...
fatal: Needed a single revision
Unable to find current origin/version/3.2.0-era revision in submodule path 'ThirdParty/protobuf'

There is a different question on shallow submodules however i dont see that working for branches, only for sha commits

回答1:

TL;DR

I think you have hit a bug in Git. To work around it, use --no-single-branch or configure the branch manually.

Other things to know:

  • If you have recursive submodules, make sure your Git is recent and use --recommend-shallow to enable shallow submodules recursively, or --no-recommend-shallow to disable them.

  • You may need to do this in two steps. I'll show this as a two-step sequence below. I know this code has evolved a lot between Git 1.7 and current (2.26 or so) Git, and I expect the two-step sequence will work for most older versions too.

The two steps are:

N=...        # set your depth here, or expand it in the two commands
git submodule update --init --depth $N --no-single-branch
git submodule update --remote --depth $N

The Git folks have been fixing various shallow-clone submodule bugs recently as part of adding --recommend-shallow with recursive submodules, so this might all work as one command. Based on the analysis below, it should all work as one command in current Git. However, --no-single-branch fetches more objects than --single-branch.

Another option may be to allow single-branch mode but fix the fetch refspec in the submodule. This requires three steps—well, three separate Git commands, anyway:

branch=...   # set this to the branch you want
git submodule update --init --depth $N
(cd path/to/submodule &&
 git config remote.origin.fetch +refs/heads/$branch:refs/remotes/origin/$branch)
git submodule update --remote --depth $N

(You could do this in all submodules with git submodule foreach, but remember to pick the right branch name per-submodule.)

Just in general—this is not specific to your error—I recommend avoiding shallow submodules: they tend not to work very well. If you really want to use them, use a pretty-big depth: e.g., 50, or 100, or more. Tune this based on your own repositories and needs. (Your current setup does allow --depth 1, provided you work around the other problem.)

Long: it's probably a bug in Git

Note that the analysis below is based on the source code. I have not actually tested this so it's possible I missed something. The principles are all sound, though.

All submodules are always "sha commits", or maybe "sha1" commits—Git used to call them that, but now calls them OIDs, where OID stands for Object ID. A future Git will probably use SHA-2.1 So "OID", or "hash ID" if one wishes to avoid TLA syndrome,2 is certainly a better term. So let me put it this way: all submodules use OID / hash-ID commits.

What do I mean by "all submodules always use OIDs / hash IDs"? Well, that's one of the key to shallow submodules. Shallow submodules are inherently fragile, and it's tricky to get Git to use them correctly in all cases. This claim:

The submodule is tracked by a branch name and not by a sha commit number.

is wrong, in an important way. No matter how hard you try, submodules—or more precisely, submodule commits—are tracked by hash ID.

Now, it's true that there are branch names involved in cloning and fetching in the submodules. When you use --shallow with submodules, this can become very important, because most servers do not allow fetch-by-hash-ID. The depth you choose—and the single branch name, since --depth implies --single-branch—must therefore be deep enough to reach the commit the superproject Git chooses.

If you override Git's tracked-by-hash-ID commit tracking with submodules, you can bypass one fragility issue. That's what you're doing, but you've hit a bug.


1And won't that be fun. Git depends rather heavily on each commit having a unique OID; the introduction of a new OID namespace, so that each Git has two OIDs, with each one being unique within its namespace, means commits won't necessarily have the appropriate OID. All of the protocols get more complicated: any Git that only supports the old scheme requires a SHA-1 hash for the (single) OID, while any Git that uses the new scheme would like a SHA-2 hash, perhaps along with a SHA-1 hash to give to old Gits. Once we have the object, we can use it to compute the other hash(es), but if we only have one of the two hashes, it needs to be the right one.

The straightforward way to handle this is to put the burden of computing the "other guy's hash" on the Git that has the object, in the case of an object existing in a repository that uses a different OID namespace. But SHA-1 Gits cannot be changed, so we can't use that method. The burden has to be on new SHA-2 Gits.

2Note that "SHA" itself is a TLA: a Three Letter Acronym. TLAS, which stands for TLA Syndrome, is an ETLA: an Extended Three Letter Acronym.