Alternative to binaries in Subversion

2019-01-13 16:27发布

站内文章 / Java

33 0

三岁会撩人

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Some of my colleagues are convinced that committing build artefacts to the subversion repository is a good idea. The argument is that this way, installation and update on the test machines is easy - just "svn up"!

I'm sure there are weighty arguments against this bad practice, but all I can think of are lame ones like "it takes up more room". What are the best, killer reasons to not do this? And what other approaches should we do instead?

This is for Java code if that makes a difference. Everything is compiled from Eclipse (with no automated PDE builds).

When I say add the build artifacts, I mean a commit would look like this:

"Added the new Whizbang feature"

 M src/foo/bar/Foo.java
 M bin/Foo.jar

Each code change has the corresponding generated jar file.

回答1:

In my opinion the code repository should only contain source code as well as third party libraries required to compile this source code (also the third party libraries might be retrieved with some dependency management tool during the build process). The resulting binaries should not get checked in along with the source code.

I think the problem in your case is that you don't have proper build scripts in place. That's why building a binary from the sources involves some work like starting up eclipse, importing the project, adjusting classpathes, etc...

If there are build scripts in place, getting the binaries can be done with a command like:

svn update; ant dist

I think the most important reason not to checkin the binaries along with the source is the resulting size of your repository. This will cause:

Larger repository and maybe too few space on versioning system server
Lots of traffic between versioning system server and the clients
Longer update times (imagine you do an SVN update from the internet...)

Another reason might be:

Source code is easily comparable, so lots of the features of a versioning system do make sense. But you can't easily compare binaries...

Also your approach as described above introduces a lot of overhead in my opinion. What if a developer forgets to update a corresponding jar file?

回答2:

Firstly, Subversion (and all others nowadays) are not source code control managers (I always thought SCM means Software Configuration Management), but version control systems. That means they store changes to the stuff you store in them, it doesn't have to be source code, it could be image files, bitmap resources, configuration files (text or xml), all kinds of stuff. There's only 1 reason why built binaries shouldn't be considered as part of this list, and that's because you can rebuild them.

However, think why you would want to store the released binaries in there as well.

Firstly, its a system to assist you, not to tell you how you should build your applications. Make the computer work for you, instead of against you. So what if storing binaries takes up space - you have hundreds of gigabytes of disk space and super fast networks. Its not a big deal to store binary objects in there anymore (whereas ten years ago it might have been a problem - this is perhaps why people think of binaries in SCM as a bad practice).

Secondly, as a developer, you might be comfortable with using the system to rebuild any version of an application, but the others who might use it (eg qa, test, support) might not. This means you'd need an alternative system to store the binaries, and really, you already have such a system, its your SCM! Make use of it.

Thirdly, you assume that you can rebuild from source. Obviously you store all the source code in there, but you don't store the compiler, the libraries, the sdks, and all the other dependant bits that are required. What happens when someone comes along and asks "can you build me the version we shipped 2 years ago, a customer has a problem with that version". 2 years is an eternity nowadays, do you even have the same compiler you used back then? What happens when you check all the source out only to find that the newly updated sdk is incompatible with your source and fails with errors? Do you wipe your development box and reinstall all the dependencies just to build this app? Can you even remember what all the dependencies were?!

The last point is the big one, to save a few k of disk space, you might cost yourself days if not weeks of pain. (And Sod's law also says that whichever app you need to rebuild will be the one that required the most obscure, difficult to set up dependency you were ever glad to get rid of)

So store the binaries in your SCM, don't worry over trivialities.

PS. we stick all binaries in their own 'release' directory per project, then when we want to update a machine, we use a special 'setup' project that consists of nothing but svn:externals. You export the setup project and you're done as it fetches the right things and puts them into the right directory structure.

回答3:

A continuous integration server like Hudson would have the ability to archive build artifacts. It doesn't help your argument with "why not" but at least it is an alternative.

回答4:

I'm sure there are weighty arguments against this bad practice

You have the wrong presumption that committing "build artifacts" to the version control is a bad idea (unless you wrongly phrased your question). It is not.

It is ok, and very important indeed, to keep what you call "build artifacts" in version control. More than that, you should also keep compilers and anything else used to transform the set of source files to a finished product.

In five years from now, you'll certainly be using different compilers and different build environments, that may happen to not be able to compile today's version of your project, for whatever reason. What could be a simple small change to fix a bug in a legacy version, will transform into a nightmare of porting that old software to current compilers and build tools, just to recompile a source file that had a one-line change.

So, there is no reason you should be so afraid of storing "build artifacts" in version control. What you may want to do is to keep them in separate places.

I suggest separating them like:

 ProjectName
 |--- /trunk
 |    |--- /build
 |    |    |--- /bin        <-- compilers go here
 |    |    |--- /lib        <-- libraries (*.dll, *.jar) go here
 |    |    '--- /object     <-- object files (*.class, *.jar) go here
 |    '--- /source          <-- sources (*.java) go here
 |         |--- package1    <-- sources (*.java) go here
 |         |--- package2    <-- sources (*.java) go here

You have to configure your IDE or your build scripts to place object files in /ProjectName/trunk/build/object (perhaps even recreating the directory structure under .../source).

This way, you give your users the option to checkout either /ProjectName/trunk to get the full building environment, or /ProjectName/trunk/source to get the source of the application.

In ../build/bin and ../build/lib you must place the compilers and libraries that were used to compile the final product, the ones used to ship the software to the user. In 5 or 10 years, you will have them there, available for your use in some eventuality.

回答5:

"committing build artifacts to the subversion repository" can be a good idea if you know why.

It is a good idea for a release management purpose, more specifically for:

1/ Packaging issue

If a build artifact is not just an exe (or a dll or...), but also:

some configuration files
some scripts to start/stop/restart your artifact
some sql to update your database
some sources (compressed into a file) to facilitate debugging
some documentation (javadoc compressed in a file)

then it is a good idea to have a build artifact and all those associated files stored in a VCS.
(Because it is not anymore just a matter of "re-building" the artifact, but also of "retrieving" all those extra files that will make that artifact run)

2/ Deployment issue

Suppose you need to deploy many artifacts in different environment (test, homologation, pre-production, production).
If:

you produce many build artifacts
those artifacts are quite long to recreate from scratch

then having those artifacts in a VCS is a good idea, in order to avoid recreating them.
You can just query them from environment to environment.

But you need to remember:

1/ you cannot store every artifacts you make in the VCS: all the intermediate build you make for continuous integration purpose must not be stored in the VCS (or you end up with a huge repository with many useless versions of the binaries).
Only the versions needed for homologation and production purposes need to be referenced.
For intermediate build, you need an external repository (maven or a shared directory) in order to publish/test quickly those builds.
2/ you should not store them in the same Subversion Repository, since your development is committed (revision number) much more often than your significant builds (the ones deemed worthy of homologation and production deployment)
That means the artifacts stored in that second repository must have a naming convention for the tag (or for a property) in order to easily retrieve the revision number of the development from which they have been built.

回答6:

In my experience could storing of Jars in SVN end in a mess.
I think it is better to save the Jar-files in a Maven-Repository like Nexus.
This has also the advantages, that you can use a dependecy managing tool like Maven or Ivy.

回答7:

Binaries, especially your own, but also third party, have no place in a source control tool like SVN.

Ideally you should have a build scripts to build your own binaries (that can then be automated with one of the many fine automatic build tools that can check the source straight out of SVN).

For third party binaries you will need a dependency management tool like Maven2. You can then set up a local Maven repository to handle all third party binaries (or just rely on the public ones). The local repo can also manage your own binaries.

回答8:

Putting the binaries in the trunk or branches is definitely overkill. Besides taking up space like you mention, it also leads to inconsistencies between source and binaries. When you refer to revision 1234, you don't want to wonder whether that means "the build resulting from the source at revision 1234" vs "the binaries in revision 1234". The same rule of avoiding inconsistencies applies to auto-generated code. You should not version what can be generated by the build.

OTOH I'm more or less OK with putting binaries in tags. This way it is easy for other projects to use the binaries of other projects via svn:externals, without needing to build all these dependencies. It also enables testers to easily switch between tags without needing a full build environment.

To get binaries in tags, you can use this procedure:

check out a clean working copy
run the build script and evaluate any test results
if the build is OK, svn add the binaries
instead of committing to the trunk or branch, tag directly from your working copy like this: svn copy myWorkingCopyFolder myTagURL
discard the working copy to avoid accidental commits of binaries to the trunk or branch

We have a tagbuild script to semi-automate steps 3 and 4.

回答9:

One good reason would be to quickly get an executable running on a new machine. In particular if the build environment takes a while to set up. (Load compilers, 3rd party libraries and tools, etc.)

回答10:

On my projects, I usually have post-build hooks to build from a special working copy on the server, namely in a path reachable from a HTTP browser. That means, after every commit, anyone [who can read the internal web] can easily download the relevant binaries. No consistency problems, instant updating + a path towards automated testing.

回答11:

Version control should have everything you need to do: svn co and then build. It shouldn't have intermediates or final product, as that defeats the purpose. You can create a new project in SVN for the result and version the binary result separately (for releases and patches if needed).

回答12:

Checking in significant binaries violates a usage principle of source code/SVN, namely that files in source control should possess a meaningful property of difference.

Todays source file is meaningfully different to yesterdays source file; a diff will produce a set of changes which make sense to a human reader. Todays picture of the front of the office does not possess a meaningful diff with regard to yesterdays picture of the office.

Because things like images do not possess the concept of difference, WHY are you storing them in a system which exists record and store the differences between files?

Revision based storage is about storing histories of changes to files. There is no meaingful change history in the data of (say) JPEG files. Such files are stored perfectly as well simply in a directory.

More practically, storing large files - build output files - in SVN makes checkout slow. The potential to abuse SVN as a generalised binary repository is there. It all seems fine at first - because there aren't many binary files. Of course, the number of files increases at time passes; I've seen modules which take hours to check out.

It is better to store large associated binary files (and output files) in a directory structure and refer to them from the build process.

回答13:

Do you mean you have the sources plus the result of the build in the same repository ?

This is a good argument for a daily build, with versioned build scripts in a separate repository. Binary in the repository itself is not bad, but sources + result of build looks bad to me

If you build several binaries and don't notice a build breakage somewhere, then you end up with binaries from different revision, and you are preparing yourself for some subtle bug chase.

Advocate for a daily, separately versioned autobuild script, than just against the binaries + code

回答14:

Subversion is a Source Control Manager -> Binaries are not source
If you use "svn up" command to update production all developers with commit-permissions can update/modify/broke production?

Alternatives: Use continuous integration like Hudson or Cruise Control.

回答15:

I think the feeling of having done a bad thing when binary files are comitted to the VCS is reasoned by the basic idea that one should never put redundant things in an archive, reasoned by resource economy and drawbacks of double data management.

That is why: If you can easily reconstruct your archived state of work from the other files of that certain version, like with simple recompiling or installing standard setups, you should not commit such binaries, but rather commit something like a README or INSTALL file. If the difficulties or risk of failing to reconstruct is too much, do commit.