Rewrite history git filter-branch create / split i

2020-02-11 08:23发布

问题:

I am currently importing a cvs project into git.
After importing, i want to rewrite the history to move an existing directory into a seperate submodule.

Suppose i have a structure like this:

file1
file2
file3
dir1
dir2
library

Now i want to rewrite the history so that the directory library is always a git submodule. Say, split out specified directories into their own submodules / subprojects

This is my currently code:

File rewrite-submodule (which is called)

cd project
git filter-branch --tree-filter $PWD/../$0-tree-filter --tag-name-filter cat -- --all

File rewrite-submodule-tree-filter

    #!/bin/bash

    function gitCommit()
    {
        unset GIT_DIR
        unset GIT_WORK_TREE
        git add -A
        if [ -n "$(git diff --cached --name-only)" ]
        then
            # something to commit
            git commit -F $_msg
        fi
    }

    _git_dir=$GIT_DIR
    _git_work_tree=$GIT_WORK_TREE
    unset GIT_DIR
    unset GIT_WORK_TREE
    _dir=$PWD

    if [ -d "library" ]
    then
        _msg=$(tempfile)
        git log ${GIT_COMMIT}^! --format="%B" > $_msg
        git rm -r --cached lib
        cd library
        if [ -d ".git" ]
        then
            gitCommit
        else
            git init
            gitCommit
        fi
        cd ..
        export GIT_DIR=$_git_dir
        export GIT_WORK_TREE=$_git_work_tree
        git submodule add -f ./lib
    fi

    GIT_DIR=$_git_dir
    GIT_WORK_TREE=$_git_work_tree
    

This code creates the .gitmodules file, but not the submodule commit entry (the line Subproject commit <sha1-hash>, output by git diff) in the main repository and the files in directory library are still versioned in the main repository and not in the subproject repository.

Thanks in advance for any hint

The .gitmodules look like this:

    [submodule "library"]
        path = library
        url = ./library
    

回答1:

I resolved my own question, here is the solution:

git-submodule-split library another_library

Script git-submodule-split:

    #!/bin/bash

    set -eu

    if [ $# -eq 0 ]
    then
        echo "Usage: $0 submodules-to-split"
    fi

    export _tmp=$(mktemp -d)
    export _libs="$@"
    for i in $_libs
    do
        mkdir -p $_tmp/$i
    done

    git filter-branch --commit-filter '
    function gitCommit()
    {
        git add -A
        if [ -n "$(git diff --cached --name-only)" ]
        then
            git commit -F $_msg
        fi
    } >/dev/null

    # from git-filter-branch
    git checkout-index -f -u -a || die "Could not checkout the index"
    # files that $commit removed are now still in the working tree;
    # remove them, else they would be added again
    git clean -d -q -f -x

    _git_dir=$GIT_DIR
    _git_work_tree=$GIT_WORK_TREE
    _git_index_file=$GIT_INDEX_FILE
    unset GIT_DIR
    unset GIT_WORK_TREE
    unset GIT_INDEX_FILE

    _msg=$(tempfile)
    cat /dev/stdin > $_msg
    for i in $_libs
    do
        if [ -d "$i" ]
        then
            unset GIT_DIR
            unset GIT_WORK_TREE
            unset GIT_INDEX_FILE
            cd $i
            if [ -d ".git" ]
            then
                gitCommit
            else
                git init >/dev/null
                gitCommit
            fi
            cd ..
            rsync -a -rtu $i/.git/ $_tmp/$i/.git/
            export GIT_DIR=$_git_dir
            export GIT_WORK_TREE=$_git_work_tree
            export GIT_INDEX_FILE=$_git_index_file
            git rm -q -r --cached $i
            git submodule add ./$i >/dev/null
            git add $i
        fi
    done
    rm $_msg
    export GIT_DIR=$_git_dir
    export GIT_WORK_TREE=$_git_work_tree
    export GIT_INDEX_FILE=$_git_index_file

    if [ -f ".gitmodules" ]
    then
        git add .gitmodules
    fi

    _new_rev=$(git write-tree)
    shift
    git commit-tree "$_new_rev" "$@";
    ' --tag-name-filter cat -- --all

    for i in $_libs
    do
        if [ -d "$_tmp/$i/.git" ]
        then
            rsync -a -i -rtu $_tmp/$i/.git/ $i/.git/
            cd $i
            git reset --hard
            cd ..
        fi
    done
    rm -r $_tmp

    git for-each-ref refs/original --format="%(refname)" | while read i; do git update-ref -d $i; done

    git reflog expire --expire=now --all
    git gc --aggressive --prune=now

    


回答2:

I have a project with a utils library that's started to be useful in other projects, and wanted to split its history off into a submodules. Didn't think to look on SO first so I wrote my own, it builds the history locally so it's a good bit faster, after which if you want you can set up the helper command's .gitmodules file and such, and push the submodule histories themselves anywhere you want.

The stripped command itself is here, the doc's in the comments, in the unstripped one that follows. Run it as its own command, with subdir set, like subdir=utils git split-submodule if you're splitting the utils directory. It's hacky because it's a one-off, but I tested it on the Documentation subdirectory in the Git history.

#!/bin/bash
# put this or the commented version below in e.g. ~/bin/git-split-submodule
${GIT_COMMIT-exec git filter-branch --index-filter "subdir=$subdir; ${debug+debug=$debug;} $(sed 1,/SNIP/d "$0")" "$@"}
${debug+set -x}
fam=(`git rev-list --no-walk --parents $GIT_COMMIT`)
pathcheck=(`printf "%s:$subdir\\n" ${fam[@]} \
    | git cat-file --batch-check='%(objectname)' | uniq`)
[[ $pathcheck = *:* ]] || {
    subfam=($( set -- ${fam[@]}; shift;
        for par; do tpar=`map $par`; [[ $tpar != $par ]] &&
            git rev-parse -q --verify $tpar:"$subdir"
        done
    ))
    git rm -rq --cached --ignore-unmatch  "$subdir"
    if (( ${#pathcheck[@]} == 1 && ${#fam[@]} > 1 && ${#subfam[@]} > 0)); then
        git update-index --add --cacheinfo 160000,$subfam,"$subdir"
    else
        subnew=`git cat-file -p $GIT_COMMIT | sed 1,/^$/d \
            | git commit-tree $GIT_COMMIT:"$subdir" $(
                ${subfam:+printf ' -p %s' ${subfam[@]}}) 2>&-
            ` &&
        git update-index --add --cacheinfo 160000,$subnew,"$subdir"
    fi
}
${debug+set +x}

#!/bin/bash
# Git filter-branch to split a subdirectory into a submodule history.

# In each commit, the subdirectory tree is replaced in the index with an
# appropriate submodule commit.
# * If the subdirectory tree has changed from any parent, or there are
#   no parents, a new submodule commit is made for the subdirectory (with
#   the current commit's message, which should presumably say something
#   about the change). The new submodule commit's parents are the
#   submodule commits in any rewrites of the current commit's parents.
# * Otherwise, the submodule commit is copied from a parent.

# Since the new history includes references to the new submodule
# history, the new submodule history isn't dangling, it's incorporated.
# Branches for any part of it can be made casually and pushed into any
# other repo as desired, so hooking up the `git submodule` helper
# command's conveniences is easy, e.g.
#     subdir=utils git split-submodule master
#     git branch utils $(git rev-parse master:utils)
#     git clone -sb utils . ../utilsrepo
# and you can then submodule add from there in other repos, but really,
# for small utility libraries and such, just fetching the submodule
# histories into your own repo is easiest. Setup on cloning a
# project using "incorporated" submodules like this is:
#   setup:  utils/.git
#
#   utils/.git:
#       @if _=`git rev-parse -q --verify utils`; then \
#           git config submodule.utils.active true \
#           && git config submodule.utils.url "`pwd -P`" \
#           && git clone -s . utils -nb utils \
#           && git submodule absorbgitdirs utils \
#           && git -C utils checkout $$(git rev-parse :utils); \
#       fi
# with `git config -f .gitmodules submodule.utils.path utils` and
# `git config -f .gitmodules submodule.utils.url ./`; cloners don't
# have to do anything but `make setup`, and `setup` should be a prereq
# on most things anyway.

# You can test that a commit and its rewrite put the same tree in the
# same place with this function:
# testit ()
# {
#     tree=($(git rev-parse `git rev-parse $1`: refs/original/refs/heads/$1));
#     echo $tree `test $tree != ${tree[1]} && echo ${tree[1]}`
# }
# so e.g. `testit make~95^2:t` will print the `t` tree there and if
# the `t` tree at ~95^2 from the original differs it'll print that too.

# To run it, say `subdir=path/to/it git split-submodule` with whatever
# filter-branch args you want.

# $GIT_COMMIT is set if we're already in filter-branch, if not, get there:
${GIT_COMMIT-exec git filter-branch --index-filter "subdir=$subdir; ${debug+debug=$debug;} $(sed 1,/SNIP/d "$0")" "$@"}

${debug+set -x}
fam=(`git rev-list --no-walk --parents $GIT_COMMIT`)
pathcheck=(`printf "%s:$subdir\\n" ${fam[@]} \
    | git cat-file --batch-check='%(objectname)' | uniq`)

[[ $pathcheck = *:* ]] || {
    subfam=($( set -- ${fam[@]}; shift;
        for par; do tpar=`map $par`; [[ $tpar != $par ]] &&
            git rev-parse -q --verify $tpar:"$subdir"
        done
    ))

    git rm -rq --cached --ignore-unmatch  "$subdir"
    if (( ${#pathcheck[@]} == 1 && ${#fam[@]} > 1 && ${#subfam[@]} > 0)); then
        # one id same for all entries, copy mapped mom's submod commit
        git update-index --add --cacheinfo 160000,$subfam,"$subdir"
    else
        # no mapped parents or something changed somewhere, make new
        # submod commit for current subdir content.  The new submod
        # commit has all mapped parents' submodule commits as parents:
        subnew=`git cat-file -p $GIT_COMMIT | sed 1,/^$/d \
            | git commit-tree $GIT_COMMIT:"$subdir" $(
                ${subfam:+printf ' -p %s' ${subfam[@]}}) 2>&-
            ` &&
        git update-index --add --cacheinfo 160000,$subnew,"$subdir"
    fi
}
${debug+set +x}


回答3:

Note: the submodule entry is only created when you do, from the parent repo a

git submodule init
git submodule update

You don't need those commands in your rewrite-submodule-tree-filter script, since it is only about setting correctly the .gitmodules file content.

You would execute those "git submodule" commands only when you are using the parent repo for the first time: see "Cloning a Project with Submodules".



回答4:

Here is an updated answer that works for me on MacOSX. The major change is the use of pushd/popd to change directories, so that a submodule can be something like module/glop and not just glop.

#!/bin/bash

set -eu

if [ $# -eq 0 ]
then
    echo "Usage: $0 submodules-to-split"
fi

export _tmp=$(mktemp -d /tmp/git-submodule-split.XXXXXX)
export _libs="$@"
for i in $_libs
do
    mkdir -p $_tmp/$i
done

git filter-branch --commit-filter '
function gitCommit()
{
    git add -A
    if [ -n "$(git diff --cached --name-only)" ]
    then
        git commit -F $_msg
    fi
} >/dev/null

# from git-filter-branch
git checkout-index -f -u -a || die "Could not checkout the index"
# files that $commit removed are now still in the working tree;
# remove them, else they would be added again
git clean -d -q -f -x >&2

_git_dir=$GIT_DIR
_git_work_tree=$GIT_WORK_TREE
_git_index_file=$GIT_INDEX_FILE
unset GIT_DIR
unset GIT_WORK_TREE
unset GIT_INDEX_FILE

_msg=$(mktemp /tmp/git-submodule-split-msg.XXXXXX)
cat /dev/stdin > $_msg
for i in $_libs
do
    if [ -d "$i" ]
    then
        unset GIT_DIR
        unset GIT_WORK_TREE
        unset GIT_INDEX_FILE
        pushd $i > /dev/null
        if [ -d ".git" ]
        then
            gitCommit
        else
            git init >/dev/null
            gitCommit
        fi
        popd > /dev/null
        mkdir -p $_tmp/$i
        rsync -a -rtu $i/.git/ $_tmp/$i/.git/
        export GIT_DIR=$_git_dir
        export GIT_WORK_TREE=$_git_work_tree
        export GIT_INDEX_FILE=$_git_index_file
        git rm -q -r --cached $i >&2
        git submodule add ./$i $i >&2
        git add $i >&2
    fi
done
export GIT_DIR=$_git_dir
export GIT_WORK_TREE=$_git_work_tree
export GIT_INDEX_FILE=$_git_index_file

if [ -f ".gitmodules" ]
then
    git add .gitmodules >&2
fi

_new_rev=$(git write-tree)
shift
git commit-tree -F $_msg "$_new_rev" $@;
rm -f $_msg
' --tag-name-filter cat -- --all

for i in $_libs
do
    if [ -d "$_tmp/$i/.git" ]
    then
        rsync -a -i -rtu $_tmp/$i/.git/ $i/.git/
        pushd $i
        git reset --hard
        popd
    fi
done
rm -rf $_tmp

git for-each-ref refs/original --format="%(refname)" | while read i; do git update-ref -d $i; done

git reflog expire --expire=now --all
git gc --aggressive --prune=now