Over the years, I've always stored binary dependencies in the \lib
folder and checked that into source-control with the rest of the project. I find I do this less so now that we have NuGet and NuGet Package Restore.
I've heard that some companies enforce a rule that no binaries can be checked into source control. The reasons cited include:
- Most VCS do not deal well with binaries - diffing and merging is not well supported
- Disk usage increases
- Commits and updates are slower
- The extra functionality, control and ease of use that a repository manager provides out of the box will be lost
- It encourages further bad practice; ideally projects should be looking to fully automate their builds, checking into version control is typically a manual step
Are there objective arguments for or against this practice for the vast majority of projects that use source-control?
I would strongly recommend you to NOT use the practice that you describe (the practice of forbidding binaries in source-control). Actually I would call this an organizational anti-pattern.
The single most important rule is:
You should be able to check out a project on a new machine, and it has to compile out of the box.
If this can be done via NuGet, then fine so. If not, check in the binaries. If there are any legal/license issues, then you should have at least a text file (named how_to_compile.txt
or similar) in your repo that contains all the required information.
Another very strong reason to do it like this is to avoid versioning problems - or do you know
- which exact version of a certain library was in operation some years ago and
- if it REALLY was the actual version that was used in the project and
- probably most important: do you know how to get that exact version?
Some other arguments against the above:
- Checking in binaries greatly facilitates build automation (and does not hinder it). This way the build system can get everything it needs from VCS without further ado. If you do it the other way, then there are always manual steps involved.
- Performance considerations are completely irrelevant as long as you work in an intranet, and only of very minor relevancy when using a web-based repository (I suppose we're talking of no more than, say, 30-40 Megs, which is not really a big deal for today's bandwidths).
- No functionality at all is lost. That's simply not true.
- It's also not true that normal commits etc. are slower. This is only the case when dealing with the large binaries themselves, which usually happens only once.
- And, if you have your binary dependencies checked in, you have at least some control. If you don't, you have none at all. And this surely has a much higher likelihood of errors...
Things depend on the workflow and the VCS used.
Using a component based workflow with SVN, you check in the includes and libs of the component. By this the libs and includes make the interface for other components. These only import the libs and includes using svn:externals while not importing the source code of the component at all. This enforces clean interfaces and a strict separation between the different components: A component is a black box that can only be used as specified in the interface. The internal structure is invisible to others. Using binaries reduces compile time and may reduce the number tools required on a machine for compiling since specialized tools that are required for creating a component need not be present when just using it.
However, using a distributed VCS things will not work this way. DVCS depend on cloning the whole repository. Checking in binaries the size of the repository will rapidly grow beyond a point where this will just take too long. While having SVN repositories of 100GB is not a problem since checkouts only deal with one revision which is smaller by several orders of magnitude, having a Git/Mercurial/Bazaar repository of that size would make it quite unusable since cloning would take ages.
So whether checking in binaries is a good idea or not depends on your workflow and also depends on the tools used.
My own rule of thumb is there generated assets should not be version controlled (regardless of whether they're binary or textual). There are several things like images, audio/video files etc. which might be checked in and for good reason.
As for the specific points.
You can't merge these kinds of files but they're usually just replaced rather than piecewise merged. Diffing them might be possible for some files using custom differs but in general, this is done using some kind of metadata like version numbers.
If you had a large text file, disk usage is not an argument against version control. Same here. The idea is that changes to this file need to be tracked. In the worst case, it's possible to put these assets in a separate repository (that doesn't change very often) and then include it in the current one using something git submodules.
This is simply not true. Operations on that specific file might be slower but that's okay. It would be the same for text files.
I think having things in version control increases the convenience provided by the repo. manager.
This touches on my point that the files in question shouldn't be generated. If the files are not generated, then checkout and build is one step. There's no "download binary assets" stage.