I'm tracking a Virtual PC virtual machine file (*.vmc) in git, and after making a change git identified the file as binary and wouldn't diff it for me. I discovered that the file was encoded in UTF-16.
Can git be taught to recognize that this file is text and handle it appropriately?
I'm using git under Cygwin, with core.autocrlf set to false. I could use mSysGit or git under UNIX, if necessary.
I've been struggling with this problem for a while, and just discovered (for me) a perfect solution:
git difftool
takes the same arguments asgit diff
would, but runs a diff program of your choice instead of the built-in GNUdiff
. So pick a multibyte-aware diff (in my case,vim
in diff mode) and just usegit difftool
instead ofgit diff
.Find "difftool" too long to type? No problem:
Git rocks.
Have you tried setting your
.gitattributes
to treat it as a text file?e.g.:
More details at http://www.git-scm.com/docs/gitattributes.html.
Had this problem on Windows recently, and the
dos2unix
andunix2dos
bins that ship with git for windows did the trick. By default they're located inC:\Program Files\Git\usr\bin\
. Observe this will only work if your file doesn't need to be UTF-16. For example, someone accidently encoded a python file as UTF-16 when it didn't need to be (in my case).and
There is a very simple solution that works out of the box on Unices.
For example, with Apple's
.strings
files just:Create a
.gitattributes
file in the root of your repository with:Add the following to your
~/.gitconfig
file:Source: Diff .strings files in Git (and older post from 2010).
By default, it looks like
git
won't work well with UTF-16; for such a file you have to make sure that noCRLF
processing is done on it, but you wantdiff
andmerge
to work as a normal text file (this is ignoring whether or not your terminal/editor can handle UTF-16).But looking at the
.gitattributes
manpage, here is the custom attribute that isbinary
:So it seems to me that you could define a custom attribute in your top level
.gitattributes
forutf16
(note that I add merge here to be sure it is treated as text):From there you would be able to specify in any
.gitattributes
file something like:Also note that you should still be able to
diff
a file, even ifgit
thinks it's binary with:Edit
This answer basically says that GNU diff wth UTF-16 or even UTF-8 doesn't work very well. If you want to have
git
use a different tool to see differences (via--ext-diff
), that answer suggests Guiffy.But what you likely need is just to
diff
a UTF-16 file that contains only ASCII characters. A way to get that to work is to use--ext-diff
and the following shell script:Note that converting to UTF-8 might work for merging as well, you just have to make sure it's done in both directions.
As for the output to the terminal when looking at a diff of a UTF-16 file:
GNU diff doesn't really care about unicode, so when you use diff --text it just diffs and outputs the text. The problem is that the terminal you're using can't handle the UTF-16 that's emitted (combined with the diff marks that are ASCII characters).
I have written a small git-diff driver,
to-utf8
, which should make it easy to diff any non-ASCII/UTF-8 encoded files. You can install it using the instructions here: https://github.com/chaitanyagupta/gitutils#to-utf8 (theto-utf8
script is available in the same repo).Note that this script requires both
file
andiconv
commands to be available on the system.