When should pdf files be tracked in a Git reposito

2019-02-16 16:38发布

问题:

I am developing a LateX package (http://www.openlilylib.org/lilyglyphs) which contains a number of small PDF files. Currently there are only a few dozens of them but as the package and its user base grows there will probably hundreds of them (but unlikely more than 1000).

The PDFs are typically only a few KB in size, but I don't know whether to track them in the Git repository. The files are subject to change at any time, but probably not too often.
Usually one is told not to track binary files which can't be diffed, but I also have read that this doesn't really matter with smaller files and a smaller overall volume. I think in the end the PDFs will sum up to not more than a few MB in total.

The package will be available as a download or through the Git repository which I prefer because using the package quite naturally leads to contributing ...
Currently when cloning the Git repository one has to rebuild the pdfs using Python and the LilyPond notation software so the stakes are rather high - which is why I would like to have the pdfs directly in the repo.

Any thoughts?


EDIT in response to answers/comments:

The pdf files are generated from the sources in the repository, which is why I'm reluctant to track them in Git.
But:

  • The pdfs are necessary to use the package so the user needs to have them
  • To generate the pdfs one needs Python as well as LilyPond, and both of them are not necessary to use the package. So I feel it is a too big burden to require someone to install two programs just to install my package.
    I don't see a problem requiring someone who decides to clone a Git repo to run an install script, but the software dependencies are maybe too high?
  • Currently generating the pdfs finishes in reasonable time because there are only a few dozens. But with a growing number of files this time could become inacceptable.

The pdf files change when they are updated/corrected. This won't happen often, and I think this is covered by tracking the source code. But the pdfs will also change whenever there is a new version of LilyPond available, which may be every two to four weeks. So while the source remains the same the pdfs will change regularely - which is a clear indicator against tracking them with Git.
On the other hand we are talking about (possibly) a few hundred files of a few KB each, so I don't know if it's worth bothering about the issue at all.

回答1:

If the documents don't change, there is no reason to track their changes in git. No revisions, no need for revision control.

But if they do change over time, and someone may need to consult the old document versions for any reason, consider these questions:

  1. Is it impossible or impractical to recreate the old versions of the documents?
  2. Is there any underlying data outside of version control that has changed, or is it still in the same state?
  3. Is the data in the documents tied to source code releases?

If the answers to these questions are yes, then they may be good candidates for version control under git.



回答2:

The question is: do you want to use git for source code management/tracking/syncing exclusively or do you want to use it for distribution as well? For smallish projects it simplifies things to do it that way, for big projects it bloats the repo.



回答3:

I know this is an old post but I found it whilst searching so other people might as well. Here are some options I found

As it has been pointed out, a lot will depend on whether these source files will change over time.

One option you have if they don't change (or change infrequently) would be to keep a copy of them on a server you control or on a Cloud storage option and make your install script download them rather than produce them.

This would probably depend on the user having wget or curl installed but most people do and if they don't, you could always prompt the user to download them manually.

If the PDFs do change with the source frequently, you could look at GIT LFS. I have never used it myself but have seen it used.