Git submodules removing on checkout another branch

2020-06-21 12:54发布

问题:

I have several branches with features in my Git repo.
Every feature is some foreign repo, plugged in as a submodule.
What should I do to correct switching between branches, with and without submodules?

Example:

$ git init
$ git commit -m "empty" --allow-empty
$ git checkout -b feature
$ git submodule init
$ git submodule add git://feature.git feature
$ git commit -a -m "add feature"
$ git checkout master
warning: unable to rmdir feature: Directory is not empty

And we have a feature in our master branch work directory.
How to prevent this?

回答1:

It seems the easiest way is manually deleting the submodule directories. The price is you have to git submodule init && git submodule update after every checkout.

To match the directories from .gitmodules:

grep path .gitmodules | sed 's/.*= //'

*From Prelang/gist/git-submodule-names

To remove it:

grep path .gitmodules | sed 's/.*= //' | xargs rm -rf


回答2:

git submodule deinit .

may do the trick



回答3:

With Git 2.27 (Q2 2020), the situation should improve, and "git checkout --recurse-submodules" works better with a nested submodule hierarchy.

See commit 846f34d, commit e84704f, commit 16f2b6b, commit 8d48dd1, commit d5779b6, commit bd35645 (17 Feb 2020) by Philippe Blain (phil-blain).
(Merged by Junio C Hamano -- gitster -- in commit fe87060, 27 Mar 2020)

unpack-trees: check for missing submodule directory in merged_entry

Reported-by: Philippe Blain
Reported-by: Damien Robert
Signed-off-by: Philippe Blain

Using git checkout --recurse-submodules to switch between a branch with no submodules and a branch with initialized nested submodules currently causes a fatal error:

$ git checkout --recurse-submodules branch-with-nested-submodules
fatal: exec '--super-prefix=submodule/nested/': cd to 'nested'
       failed: No such file or directory
error: Submodule 'nested' could not be updated.
error: Submodule 'submodule/nested' cannot checkout new HEAD.
error: Submodule 'submodule' could not be updated.
M   submodule
Switched to branch 'branch-with-nested-submodules'

The checkout succeeds, but the worktree and index of the first level submodule are left empty:

$ cd submodule
$ git -c status.submoduleSummary=1 status
HEAD detached at b3ce885
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
      deleted:    .gitmodules
      deleted:    first.t
      deleted:    nested

fatal: not a git repository: 'nested/.git'
Submodule changes to be committed:

* nested 1e96f59...0000000:

$ git ls-files -s
$ # empty
$ ls -A
.git

The reason for the fatal error during the checkout is that a child git process tries to cd into the yet unexisting nested submodule directory.

The sequence is the following:

  1. The main git process (the one running in the superproject) eventually reaches write_entry() in entry.c, which creates the first level submodule directory and then calls submodule_move_head() in submodule.c, which spawns git read-tree in the submodule directory.

  2. The first child git process (the one in the submodule of the superproject) eventually calls check_submodule_move_head() at unpack_trees.c:2021, which calls submodule_move_head in dry-run mode, which spawns git read-tree in the nested submodule directory.

  3. The second child git process tries to chdir() in the yet unexisting nested submodule directory in start_command() at run-command.c and dies before exec'ing.

The reason why check_submodule_move_head() is reached in the first child and not in the main process is that it is inside an if(submodule_from_ce()) construct, and submodule_from_ce() returns a valid struct submodule pointer, whereas it returns a null pointer in the main git process.

The reason why submodule_from_ce() returns a null pointer in the main git process is because the call to cache_lookup_path() in config_from() (called from submodule_from_path() in submodule_from_ce()) returns a null pointer since the hashmap "for_path" in the submodule_cache of the_repository is not yet populated.
It is not populated because both repo_get_oid(repo, GITMODULES_INDEX, &oid) and repo_get_oid(repo, GITMODULES_HEAD, &oid) in config_from_gitmodules() at submodule-config.c return -1, as at this stage of the operation, neither the HEAD of the superproject nor its index contain any .gitmodules file.

In contrast, in the first child the hashmap is populated because repo_get_oid(repo, GITMODULES_HEAD, &oid) returns 0 as the HEAD of the first level submodule, i.e. .git/modules/submodule/HEAD, points to a commit where .gitmodules is present and records 'nested' as a submodule.

Fix this bug by checking that the submodule directory exists before calling check_submodule_move_head() in merged_entry() in the if(!old) branch, i.e. if going from a commit with no submodule to a commit with a submodule present.

Also protect the other call to check_submodule_move_head() in merged_entry() the same way as it is safer, even though the else if (!(old->ce_flags & CE_CONFLICTED)) branch of the code is not at play in the present bug.

The other calls to check_submodule_move_head() in other functions in unpack_trees.c are all already protected by calls to lstat() somewhere in the program flow so we don't need additional protection for them.

All commands in the unpack_trees machinery are affected, i.e. checkout, reset and read-tree when called with the --recurse-submodules flag.

This bug was first reported here.