Suppose I have a git repo containing a file text.txt
inside a directory dir_a
. Later on, I decide to move text.txt
to a new directory called dir_b
.
After a while, I decide I should split dir_b
into its own standalone git repository using git subtree split
. By default, the earliest commit in dir_b
's repository is the commit where I moved text.txt
from dir_a
to dir_b
, which is unfortunate, because e.g. a blame won't work as intended.
Is there a way to preserve, in the new git repo, the changes made to text.txt
when it was still in dir_a
?
To make it clear, in the original repository, the commit where I move text.txt
from dir_a
to dir_b
successfully registers the move operation as a rename, so e.g. git diff
works properly there. My problem is that, in the new repository, the commits made before the move aren't carried over to the new repository.
Edit: I quite missed the
git subtree split -P prefix
part of this. The original answer still applies, but with a possibly-fatal twist.When you run
git subtree split -P prefix [ options ] [ commit-range ]
, you are telling Git to copy some commits to new ones. You have Git copy whatever commits contain any files within the givenprefix
, but with these changes:prefix
.prefix
(and a slash) off.(You could do this with
git filter-branch
as well, although it would be slower thangit subtree split
, and it requires that you first create a new branch to filter.)The result is a new, disjoint commit graph (or subgraph since it's now added to your main commit graph), rooted at the first-copied commit and terminating at the last-copied one. (The copy process has to enumerate commits, in Git's usual backwards fashion, from a single tip commit, not from multiple tip commits. Once all commits have been found this way, the copying goes from root / last-enumerated, to tip, as it must.) You can then give this new sub-graph a branch name using
git subtree
's-b branch
option. If you don't give it a name, you have a short period (14 days by default) during which you can do something with the tip commit hash ID thatgit subtree split
prints, and after that the copies are eligible for automatic garbage-collection.As a brief illustration, consider the following graph:
Let's say commit
A
has in aREADME
(and nothing else),B
adds the first part of the project,C-D-E
is more of the project,F
andG
were from a feature branch and add a subtree namedsubbie
containing various files,H
merges the subtree, inI
it's renamedfeature
, inJ
nothing happens to it, and inK
feature/README_TOO
is added.If you now split
feature
as a subtree, this makes Git copy commits:I
:feature
first appears as a name, containing, e.g.,feature/__init.py
andfeature/impl.py
, for instance.K
:feature/README_TOO
appears.As a new, independent sub-graph of commits, it looks like this:
Note that we did not copy
F
,G
, andH
: they do not have files whose name starts withfeature/
. CommitJ
does have such files, but they are the same as they were in commitI
, so we skipped it. Meanwhile, the names of the files in commitsI'
andK'
are notfeature/__init__.py
and so on, but rather simply__init__.py
and so on.As I noted in the original answer, the history in a repository is the commits. We view the history by starting from a branch tip commit and working backwards. If we start at
K'
and work backwards toI'
, the history is just those two commits. To discover the rename, we would have to also copy commitsF
andG
at least, and maybeH
as well (there's nothing forH
to merge this time as we would skipA-B-C-D-E
, so we'd probably just dropH
entirely). But to do that, we would have to know to preservesubbie/*
.You could modify the
git subtree
code to allow additional preserved-as-as prefix arguments. There is no clear way to reverse this after-the-fact, though. The basicgit subtree
code relies on a unique prefix: it was always stripped off, so to reverse the transformation, we always add it back. The two obvious options are: never strip any prefix (so never add anything), or require that additional, non-stripped prefixes never "collide with" prefix-stripped names. That is, given any arbitrary copied commit, if its snapshot has a file namedpa/th/to/file.ext
, eitherpa/th/to
is not a "preserved in place" prefix (so it gets the-P
prefix added back), or elsepa/th/to
is such a prefix (so it gets nothing added).Original answer
In Git, files don't have history. There is nothing to preserve!
In Git, only commits have—or rather, are—history. Each commit is a complete snapshot of a source tree, plus some meta-data: a name and email and a timestamp (as the author of the commit), another name/email/timestamp triple (for the committer); a commit log message; and—crucial for forming history—the ID of a parent commit.
(Some commits, which we call merge commits, have two or more parents. At least one commit—namely the first ever made—has no parents; we call this a root commit. But most commits just have one parent, which is normally the commit that was the tip of some branch, just before the committer made a new commit that became the tip of that branch.)
It's by comparing a commit against its parent that we find out what happened over time. If the previous (parent) commit had 10 files, and the subsequent (child) commit had 11 files, then someone must have added a file. If the child commit has a new line 20 in
README.txt
, they must have added that line. But we only discover these dynamically, by comparing parent and child. That is the history, formed by the commits.The
git blame
code will, as it works from child back to parent (and then treating that parent as another child of another parent), search for lines taken from other files, or for entire files renamed from one location to another. How well that search works is a separate matter—but as a general rule, if some filep/a/t/h.ext
exists in the parent but not the child, and some other filen/e/w.name
exists in the child but not the parent, Git will put these two files into a "candidates for rename detection" list.If two differently-named files are absolutely, 100%, bit-for-bit identical, Git will nearly always1 pair them up. The less-identical they become, the less-likely Git will be to pair them up. This pairing has control knobs: in
git diff
and friends, they are the--find-renames
value. There is also a--find-copies
and a--find-copies-harder
. Ingit blame
, the-C
argument controls things, in a somewhat different way. I have not experimented enough with this to say for sure how it works, but either one or two-C
arguments should certainly detect a whole-file rename, based on the documentation.1For
git diff
, rename-finding is completely disabled by default in Git versions before 2.9, but enabled by default in Git versions 2.9 and higher. You can setdiff.renames
totrue
to enable it, without configuring a particular-M
/--find-renames
threshold, in older versions of Git.There is also a maximum pairing-queue size, configurable as
diff.renameLimit
. Hitting that limit is rare, although renaming every file in a directory—which is how Git treats renaming a directory—is more likely to be able to hit it. The default limit has grown over the years; it used to be 100, then 200, and is now 400 files.