How to track revision history of revision history?

2019-06-02 04:25发布

问题:

I am working on a programming tutorial project, and I want the sample source code for this tutorial to have a meaningful revision history, relevant to the tutorial's progress. Inevitably, I won't get all the tutorial commits perfectly right the first time, and I don't want that revision history cluttered with commits where I modify the tutorial's commits in a meta way. I think this means that I want two levels of version control: an inner one that is relevant to users of the tutorial, and an outer one that tracks how I rewrite the history of the inner one.

I see from other SO questions (such as 'Is it possible to have a git repo inside another git repo') that Git ignores .git within subdirectories. That would seem to preclude git for at least one of my levels of version control.

Can anyone recommend a strategy for tracking both the changes to content, and rewrites of that history?

回答1:

I can think of two ways to do it, both of them using some plumbing where the existing porcelain's built with other things in mind.

The first way's easiest but I only recently learned it was even possible1 and suspect some experienced git users will regard it as a monstrosity. The thing is, here it's a very useful monstrosity, and in past debates between the two characterizations "useful" has sometimes proved more ... useful. So:


The first way to do it:

You can directly track content that is also tracked in nested repositories. Once git is tracking any content within a directory it will completely2 ignore any repository you subsequently create there.

It appears from your question that you've got neatly severable sections, so, from the top:

Create a perfectly ordinary repo with stub (or current) initial content

# from the top:

# create and commit the empty skeleton
git init book
cd !$
mkdir -p sect{1,2,3}
touch {.,!$}/.gitignore
# copy in any initial content here
git add .
git ls-files -s # to see exactly what you've done so far
git commit -m 'initial skeleton'

Create sub-repositories to independently track the individual sections

# now git is directly tracking content in each section, and commands in the 
# parent will _ignore_ the existence of any nested repositories you subsequently 
# create, but not there worktrees (because of the existing tracked content). viz.:

( cd sect1
  git init 
  git add . 
  git commit -m 'initial skeleton'
  git branch publishing-history
)
^1^2
^2^3

Work on each section independently and freely

You now have the sections tracked in multiple repositories, and can work on each section entirely independently:

cd sect1
# work work commit commit lalala
# ... do whatever in the other repos

Publish the combined current content of all sections

and it's time to publish the current content in each subdirectory. Get their content all cleaned up for publication, and from any of them, just once, do

cd ..
git add -A .
git commit
published=`git rev-parse HEAD`

You're done. How to record the act:

Record the act in each section, for reference

for section in sect*; do
    cd $section
    git update-ref refs/heads/publishing-history $(
        # log where the checked-out commit was published
        git commit-tree \
                  -p publishing-history \
                  -p `git rev-parse HEAD` \
                  -m "## published in main repository commit $published ##" \
                HEAD^{tree}  # just `HEAD:` will work too
    )
    cd ..
done

There are no constraints on the commits you choose to publish or in what sequence. This is why Linus calls git a "stupid content tracker": no abstractions at the core. The branch correctly records the sequence and content and ancestry of those commits.

Convenience links for commit-tree and update-ref.

Building a rewritten, independent publishing history

git symbolic-ref HEAD refs/heads/newmaster

and publish, as above, any sequence of checked-out commits you like. The publish-history branch will faithfully record exactly what you publish and when.


You can see where this is going, right? You can construct arbitrary histories from committed content with commit-tree and update-ref. If the commit sequence in the parent repo isn't what you want, replace it with an entirely different history that you do want by directly committing the correct sequence of trees. To record separate notes in the parent repository, use the publishing-history construct on it too.

Just a note: if you start doing extensive history rewrites and the checkouts involved in constructing a new sequence start to seem burdensome, git's got you covered. Start from the gitcore-tutorial when you're ready.


A second way to do it

This way replaces the "Publish combined current content" step by fetching and manipulating trees from entirely separate repositories rather than using the overlaid-repositories method above. The publish step is then

cd ../main
git read-tree --empty
for repo in sect{1,2,3}; do
    ( cd ../$repo
      tag -f fetchme HEAD^{tree}
    )
    git fetch ../sect1 fetchme
    git read-tree -m --prefix=sect1 FETCH_HEAD
done
git commit

but this has the disadvantage that you'll have to explicitly synchronize not just the duplicate worktrees but also more copies of the worktrees to enable any whole-project tests without having to publish (as above) every version you're going to test.

Maybe it's just a mental-state thing, but this way looks enough clunkier to administer that I don't think it's worth pursuing.


Random notes:

  • git clean -dfx doesn't clean out the worktrees of the nested repos, git apparently only ignores the nested .gits when it's useful. Hmm. This could be abused in useful ways.

  • If you want to protect your embedded repositories from random rm -rf sect3's you can use the method git submodule uses,

    mv .git /someplace/safer
    echo gitdir: /someplace/safer >.git
    

and reconstruction after the nuke is mkdir -p and echo

  • someone may find a more elegant way to do this and if so I hope at least a sketch of it shows up soon; I don't see any in the above but I do tend to overengineer the hell out of things and then boil it all out.

1 See here for the question that taught me this was possible

2 It turns out that git clean does recognize the nested repositories, and doesn't clean them. So git clean -dfx is still safe. More useful behavior :-)



回答2:

In git there is the slightly higher-level concept of the 'published branch', a branch with cleaned-up commits that you want the world to see; and that of the unpublished branch, which you keep as a draft area for commits you're still polishing.

The unpublished branch will usually have a lot more commits in it because you're writing away and committing often (right?!). Then you clean up using git rebase -i and push the clean history into the published branch, then push that out into the remote repo for everyone's enjoyment.

More details at the following page, which is also just in general a great collection of git best practices: http://sethrobertson.github.io/GitBestPractices/#sausage