I am currently importing a cvs project into git.
After importing, i want to rewrite the history to move an existing directory into a seperate submodule.
Suppose i have a structure like this:
file1
file2
file3
dir1
dir2
library
Now i want to rewrite the history so that the directory library
is always a git submodule. Say, split out specified directories into their own submodules / subprojects
This is my currently code:
File rewrite-submodule (which is called)
cd project
git filter-branch --tree-filter $PWD/../$0-tree-filter --tag-name-filter cat -- --all
File rewrite-submodule-tree-filter
#!/bin/bash
function gitCommit()
{
unset GIT_DIR
unset GIT_WORK_TREE
git add -A
if [ -n "$(git diff --cached --name-only)" ]
then
# something to commit
git commit -F $_msg
fi
}
_git_dir=$GIT_DIR
_git_work_tree=$GIT_WORK_TREE
unset GIT_DIR
unset GIT_WORK_TREE
_dir=$PWD
if [ -d "library" ]
then
_msg=$(tempfile)
git log ${GIT_COMMIT}^! --format="%B" > $_msg
git rm -r --cached lib
cd library
if [ -d ".git" ]
then
gitCommit
else
git init
gitCommit
fi
cd ..
export GIT_DIR=$_git_dir
export GIT_WORK_TREE=$_git_work_tree
git submodule add -f ./lib
fi
GIT_DIR=$_git_dir
GIT_WORK_TREE=$_git_work_tree
This code creates the .gitmodules file, but not the submodule commit entry (the line Subproject commit <sha1-hash>
, output by git diff
) in the main repository and the files in directory library
are still versioned in the main repository and not in the subproject repository.
Thanks in advance for any hint
The .gitmodules look like this:
[submodule "library"]
path = library
url = ./library
I resolved my own question, here is the solution:
git-submodule-split library another_library
Script git-submodule-split
:
#!/bin/bash
set -eu
if [ $# -eq 0 ]
then
echo "Usage: $0 submodules-to-split"
fi
export _tmp=$(mktemp -d)
export _libs="$@"
for i in $_libs
do
mkdir -p $_tmp/$i
done
git filter-branch --commit-filter '
function gitCommit()
{
git add -A
if [ -n "$(git diff --cached --name-only)" ]
then
git commit -F $_msg
fi
} >/dev/null
# from git-filter-branch
git checkout-index -f -u -a || die "Could not checkout the index"
# files that $commit removed are now still in the working tree;
# remove them, else they would be added again
git clean -d -q -f -x
_git_dir=$GIT_DIR
_git_work_tree=$GIT_WORK_TREE
_git_index_file=$GIT_INDEX_FILE
unset GIT_DIR
unset GIT_WORK_TREE
unset GIT_INDEX_FILE
_msg=$(tempfile)
cat /dev/stdin > $_msg
for i in $_libs
do
if [ -d "$i" ]
then
unset GIT_DIR
unset GIT_WORK_TREE
unset GIT_INDEX_FILE
cd $i
if [ -d ".git" ]
then
gitCommit
else
git init >/dev/null
gitCommit
fi
cd ..
rsync -a -rtu $i/.git/ $_tmp/$i/.git/
export GIT_DIR=$_git_dir
export GIT_WORK_TREE=$_git_work_tree
export GIT_INDEX_FILE=$_git_index_file
git rm -q -r --cached $i
git submodule add ./$i >/dev/null
git add $i
fi
done
rm $_msg
export GIT_DIR=$_git_dir
export GIT_WORK_TREE=$_git_work_tree
export GIT_INDEX_FILE=$_git_index_file
if [ -f ".gitmodules" ]
then
git add .gitmodules
fi
_new_rev=$(git write-tree)
shift
git commit-tree "$_new_rev" "$@";
' --tag-name-filter cat -- --all
for i in $_libs
do
if [ -d "$_tmp/$i/.git" ]
then
rsync -a -i -rtu $_tmp/$i/.git/ $i/.git/
cd $i
git reset --hard
cd ..
fi
done
rm -r $_tmp
git for-each-ref refs/original --format="%(refname)" | while read i; do git update-ref -d $i; done
git reflog expire --expire=now --all
git gc --aggressive --prune=now
I have a project with a utils
library that's started to be useful in other projects, and wanted to split its history off into a submodules. Didn't think to look on SO first so I wrote my own, it builds the history locally so it's a good bit faster, after which if you want you can set up the helper command's .gitmodules
file and such, and push the submodule histories themselves anywhere you want.
The stripped command itself is here, the doc's in the comments, in the unstripped one that follows. Run it as its own command, with subdir
set, like subdir=utils git split-submodule
if you're splitting the utils
directory. It's hacky because it's a one-off, but I tested it on the Documentation subdirectory in the Git history.
#!/bin/bash
# put this or the commented version below in e.g. ~/bin/git-split-submodule
${GIT_COMMIT-exec git filter-branch --index-filter "subdir=$subdir; ${debug+debug=$debug;} $(sed 1,/SNIP/d "$0")" "$@"}
${debug+set -x}
fam=(`git rev-list --no-walk --parents $GIT_COMMIT`)
pathcheck=(`printf "%s:$subdir\\n" ${fam[@]} \
| git cat-file --batch-check='%(objectname)' | uniq`)
[[ $pathcheck = *:* ]] || {
subfam=($( set -- ${fam[@]}; shift;
for par; do tpar=`map $par`; [[ $tpar != $par ]] &&
git rev-parse -q --verify $tpar:"$subdir"
done
))
git rm -rq --cached --ignore-unmatch "$subdir"
if (( ${#pathcheck[@]} == 1 && ${#fam[@]} > 1 && ${#subfam[@]} > 0)); then
git update-index --add --cacheinfo 160000,$subfam,"$subdir"
else
subnew=`git cat-file -p $GIT_COMMIT | sed 1,/^$/d \
| git commit-tree $GIT_COMMIT:"$subdir" $(
${subfam:+printf ' -p %s' ${subfam[@]}}) 2>&-
` &&
git update-index --add --cacheinfo 160000,$subnew,"$subdir"
fi
}
${debug+set +x}
#!/bin/bash
# Git filter-branch to split a subdirectory into a submodule history.
# In each commit, the subdirectory tree is replaced in the index with an
# appropriate submodule commit.
# * If the subdirectory tree has changed from any parent, or there are
# no parents, a new submodule commit is made for the subdirectory (with
# the current commit's message, which should presumably say something
# about the change). The new submodule commit's parents are the
# submodule commits in any rewrites of the current commit's parents.
# * Otherwise, the submodule commit is copied from a parent.
# Since the new history includes references to the new submodule
# history, the new submodule history isn't dangling, it's incorporated.
# Branches for any part of it can be made casually and pushed into any
# other repo as desired, so hooking up the `git submodule` helper
# command's conveniences is easy, e.g.
# subdir=utils git split-submodule master
# git branch utils $(git rev-parse master:utils)
# git clone -sb utils . ../utilsrepo
# and you can then submodule add from there in other repos, but really,
# for small utility libraries and such, just fetching the submodule
# histories into your own repo is easiest. Setup on cloning a
# project using "incorporated" submodules like this is:
# setup: utils/.git
#
# utils/.git:
# @if _=`git rev-parse -q --verify utils`; then \
# git config submodule.utils.active true \
# && git config submodule.utils.url "`pwd -P`" \
# && git clone -s . utils -nb utils \
# && git submodule absorbgitdirs utils \
# && git -C utils checkout $$(git rev-parse :utils); \
# fi
# with `git config -f .gitmodules submodule.utils.path utils` and
# `git config -f .gitmodules submodule.utils.url ./`; cloners don't
# have to do anything but `make setup`, and `setup` should be a prereq
# on most things anyway.
# You can test that a commit and its rewrite put the same tree in the
# same place with this function:
# testit ()
# {
# tree=($(git rev-parse `git rev-parse $1`: refs/original/refs/heads/$1));
# echo $tree `test $tree != ${tree[1]} && echo ${tree[1]}`
# }
# so e.g. `testit make~95^2:t` will print the `t` tree there and if
# the `t` tree at ~95^2 from the original differs it'll print that too.
# To run it, say `subdir=path/to/it git split-submodule` with whatever
# filter-branch args you want.
# $GIT_COMMIT is set if we're already in filter-branch, if not, get there:
${GIT_COMMIT-exec git filter-branch --index-filter "subdir=$subdir; ${debug+debug=$debug;} $(sed 1,/SNIP/d "$0")" "$@"}
${debug+set -x}
fam=(`git rev-list --no-walk --parents $GIT_COMMIT`)
pathcheck=(`printf "%s:$subdir\\n" ${fam[@]} \
| git cat-file --batch-check='%(objectname)' | uniq`)
[[ $pathcheck = *:* ]] || {
subfam=($( set -- ${fam[@]}; shift;
for par; do tpar=`map $par`; [[ $tpar != $par ]] &&
git rev-parse -q --verify $tpar:"$subdir"
done
))
git rm -rq --cached --ignore-unmatch "$subdir"
if (( ${#pathcheck[@]} == 1 && ${#fam[@]} > 1 && ${#subfam[@]} > 0)); then
# one id same for all entries, copy mapped mom's submod commit
git update-index --add --cacheinfo 160000,$subfam,"$subdir"
else
# no mapped parents or something changed somewhere, make new
# submod commit for current subdir content. The new submod
# commit has all mapped parents' submodule commits as parents:
subnew=`git cat-file -p $GIT_COMMIT | sed 1,/^$/d \
| git commit-tree $GIT_COMMIT:"$subdir" $(
${subfam:+printf ' -p %s' ${subfam[@]}}) 2>&-
` &&
git update-index --add --cacheinfo 160000,$subnew,"$subdir"
fi
}
${debug+set +x}
Note: the submodule entry is only created when you do, from the parent repo a
git submodule init
git submodule update
You don't need those commands in your rewrite-submodule-tree-filter
script, since it is only about setting correctly the .gitmodules
file content.
You would execute those "git submodule
" commands only when you are using the parent repo for the first time: see "Cloning a Project with Submodules".
Here is an updated answer that works for me on MacOSX. The major change is the use of pushd/popd to change directories, so that a submodule can be something like module/glop and not just glop.
#!/bin/bash
set -eu
if [ $# -eq 0 ]
then
echo "Usage: $0 submodules-to-split"
fi
export _tmp=$(mktemp -d /tmp/git-submodule-split.XXXXXX)
export _libs="$@"
for i in $_libs
do
mkdir -p $_tmp/$i
done
git filter-branch --commit-filter '
function gitCommit()
{
git add -A
if [ -n "$(git diff --cached --name-only)" ]
then
git commit -F $_msg
fi
} >/dev/null
# from git-filter-branch
git checkout-index -f -u -a || die "Could not checkout the index"
# files that $commit removed are now still in the working tree;
# remove them, else they would be added again
git clean -d -q -f -x >&2
_git_dir=$GIT_DIR
_git_work_tree=$GIT_WORK_TREE
_git_index_file=$GIT_INDEX_FILE
unset GIT_DIR
unset GIT_WORK_TREE
unset GIT_INDEX_FILE
_msg=$(mktemp /tmp/git-submodule-split-msg.XXXXXX)
cat /dev/stdin > $_msg
for i in $_libs
do
if [ -d "$i" ]
then
unset GIT_DIR
unset GIT_WORK_TREE
unset GIT_INDEX_FILE
pushd $i > /dev/null
if [ -d ".git" ]
then
gitCommit
else
git init >/dev/null
gitCommit
fi
popd > /dev/null
mkdir -p $_tmp/$i
rsync -a -rtu $i/.git/ $_tmp/$i/.git/
export GIT_DIR=$_git_dir
export GIT_WORK_TREE=$_git_work_tree
export GIT_INDEX_FILE=$_git_index_file
git rm -q -r --cached $i >&2
git submodule add ./$i $i >&2
git add $i >&2
fi
done
export GIT_DIR=$_git_dir
export GIT_WORK_TREE=$_git_work_tree
export GIT_INDEX_FILE=$_git_index_file
if [ -f ".gitmodules" ]
then
git add .gitmodules >&2
fi
_new_rev=$(git write-tree)
shift
git commit-tree -F $_msg "$_new_rev" $@;
rm -f $_msg
' --tag-name-filter cat -- --all
for i in $_libs
do
if [ -d "$_tmp/$i/.git" ]
then
rsync -a -i -rtu $_tmp/$i/.git/ $i/.git/
pushd $i
git reset --hard
popd
fi
done
rm -rf $_tmp
git for-each-ref refs/original --format="%(refname)" | while read i; do git update-ref -d $i; done
git reflog expire --expire=now --all
git gc --aggressive --prune=now