How does CI affect semantic versioning?

2020-08-01 06:35发布

问题:

In Countinous Delivery book, it's recommended to keep everything - including CI scripts - in the version control. Actually, current CI systems like gitlab CI already follow this rule of thumb and search for CI scripts in the same codebase.
On the other hand, we are versioning our codebase (and it's built artifacts) whenever it changes. And we follow semantic versioning for that; incrementing patch field for bugfixes, minor for non-breaking features, and so on...
And we make sure the version is incremented between commits by checking it in the CI.
But, there are commits that only change the CI scripts; i.e. adding an analysis job, optimizing another, etc.
My question, after this long boring preface, is that what is the best practice to versioning such changes to the CI? Since it possibly can affect the final built artifact (e.g. changing a build flag in the CI job for optimization or ...).
Is it ok to increment the version in this case?

回答1:

Git is a revision control system. Every time you commit something to a git repo, it labels the content of the repo with a content hash value that represents that version of the repo. Semantic versioning of a git repo's content is redundant and pointless. The whole point of SemVer is to provide a means for producers to communicate risk to consumers. In other words, semantic versioning is intended for build product labeling, not the bits that go into producing the build.

If you attempt to apply SemVer semantics to the repo, you are labeling the product inputs, not the product itself. You should not apply a SemVer string until after all unit/regression/acceptance tests have been performed. How else can you have any certainty whether the code/build-script changes have broken anything?


Pre-build labeling cannot work. Build processes that are capable of reproducing the exact same output twice in a row, are extremely rare, if any exist at all. It is a violation of best practice to have multiple API's/packages in the world with the same SemVer string attached to them. If you label the repo content and then forward that label to the build output, every time you run the build, you produce a package with different content. There will always be some risk that more than one of those outputs will be released into the wild. Many security conscious consumers pay close attention to the content hash of packages they consume. Detecting that a particular producer has released multiple package hashes without bumping the version number, will raise red flags and lead to mistrust of that producers internal processes.


This is a very deep topic that can't be fully covered here. Other issues to consider are OS/Compiler/Tool chain updates. Will you also be committing the entire build tool chain to the same repo? This is an untenable approach, full of hazards I cannot fully enumerate, without taking a few months off work to document them.

Best practice:

  • Use semantic commit messages that clearly state the developer's intent.
  • Validate build outputs prior to packaging/labeling.
  • Always keep humans in the loop, for non-prerelease publications.