Can I have Mac, Windows, and linux share a git rep

2019-08-20 16:52发布

Okay everyone: I'm setting up a git repository for researchers to share scripts and data for a research project. The researchers aren't programmers or particularly git-savvy, so I'm hoping to point desktop git clients at a shared repository — everyone has access to this in their local filesystem.

The problem: line endings. We have people using:

  • Windows (mainly R) (CRLF)
  • linux and Mac scripts (mainly R and python) (LF only)
  • Excel on Mac, saving as .CSV (CR only, yes this is an actual thing)

git's autocrlf doesn't understand Mac line endings for some reason, so that doesn't work well for me.

First, I want to track changes to these files without telling people "you can't use the tools you're familiar with" because then they will just store the data and scripts somewhere outside of the repo.

Second, I want to have the git repo not be full of stupid line ending commits and merge conflicts, because I will probably need to solve all the merge conflicts that happen.

Third, I'd like people to not have to manually run some "fix all the line endings" script because that would suck. If this is what I need to do... whatever, I guess.

Assuming "first, normalize the line endings" is the answer, any sense of which ones I should choose?

I'd thought about a pre-commit hook, but it sounds like this would involve somehow getting the same script to run on both Windows and unix, and that sounds terrible. Maybe this is a secretly practical option?

Thanks.

1条回答
Ridiculous、
2楼-- · 2019-08-20 17:54

As Marek Vitek said in comments, you may need to write at least a tiny bit of code.

Second, for a bit of clarity, here's how Git itself deals—or doesn't deal—with data transformation:

  • Data (files) inside commits is sacrosanct. It literally can't be changed, so once something is inside a commit, it is forever.1

  • Data in the work-tree can and should be in a "host friendly" format. That is, if you're on a Mac running program Pmac that requires that lines end with CR, the data can be in that format. If you're on a Windows box running the equivalent Pwindows that requires that lines end with CR+LF, the data can be in that format.

  • Conversions to "host format" happen when files move from the index/staging-area to the work-tree. Conversions from "host format" to "internal storage format" happen when files move from the work-tree to the index/staging area.

Most of Git's built in filters do only CRLF to LF, or LF to CRLF, transformations. There is one "bigger" built in filter, called ident (not to be confused with indent), and you can define your own filters called clean and smudge, which can do arbitrary things. This means you can define a smudge filter that, on the Mac (but not on Windows) will (e.g.) change LF to CR. The corresponding Mac-only clean filter might then change CR to LF.

Note that many transformations are not data-preserving on raw binary data: there might be a byte that happens to resemble an LF, or CR, or two in a row that resemble CRLF, but are not meant to be interpreted that way. If you change these, you wreck the binary data. So it's important to apply filtering only to files where a byte that seems to be one of these things, really is one of these things. You can use .gitattributes path name matching, e.g., *.suffix, to infer which files get what filters applied.

The correct filtering actions to apply will, of course, depend on the host.

Merges and "renormalize"

When doing a merge, Git normally just takes the files directly from the pure versions inside each of the commits involved. Since it's Git (and git diff) doing interpretation of lines, you generally want these to have Git's preferred "line" format, i.e., ending with LF (it's OK if they have or lack a CR before the LF as long as all three versions feeding into a three-way merge all have the same CR-before-LF-ness). You can use the "renormalize" setting, though, to make Git do a virtual pass through your smudge-and-then-clean filters before it does the three-way merging. You would need this only when existing commits (base and two branch tips) that you now intend to merge, were stored in a different way from the way you have all agreed now to keep inside the permanent commits. (I have not actually tried any of this, but the principle is straightforward enough.)


1You can remove a commit, but to do so, you must also remove all of that commit's descendants. In practice, this means commits that have been shared / pushed, generally never go away; only private commits can go away or be replaced with new-and-improved commits. It's difficult to get everyone who has commit a9f3c34... to ditch it in favor of the new and improved 07115c3..., even if you can get this word out to everyone.

查看更多
登录 后发表回答