TL;DR
Running COPY . /app
on top of an image with but slightly outdated source code creates a new layer as large as the whole source code, even when there is only a few bytes worth of changes.
Is there a way to add only changed files to this docker image as a new layer - without resorting to docker commit?
Long version:
When deploying our application to production, we need to add the source code to the image. A very simple Dockerfile is used for this:
FROM neam/dna-project-base-debian-php:0.6.0
COPY . /app
Since the source code is huge (1.2 GB), this makes for quite a hefty push upon each deploy:
$ docker build -f .stack.php.Dockerfile -t project/project-web-src-php:git-commit-17c279b .
Sending build context to Docker daemon 1.254 GB
Step 0 : FROM neam/dna-project-base-debian-php:0.6.0
---> 299c10c416fc
Step 1 : COPY . /app
---> 78a30802804a
Removing intermediate container 13b49c323bb6
Successfully built 78a30802804a
$ docker tag -f project/project-web-src-php:git-commit-17c279b tutum.co/project/project-web-src-php:git-commit-17c279b
$ docker login --email=tutum-project@project.com --username=project --password=******** https://tutum.co/v1
WARNING: login credentials saved in /home/dokku/.docker/config.json
Login Succeeded
$ docker push tutum.co/project/project-web-src-php:git-commit-17c279b
The push refers to a repository [tutum.co/project/project-web-src-php] (len: 1)
Sending image list
Pushing repository tutum.co/project/project-web-src-php (1 tags)
Image a604b236bcde already pushed, skipping
Image 1565e86129b8 already pushed, skipping
...
Image 71156b357f2f already pushed, skipping
Image 299c10c416fc already pushed, skipping
78a30802804a: Pushing [=========> ] 234.2 MB/1.254 GB
Upon the next deploy, we only want to add the changed files to the image, but watch and behold when running COPY . /app
on top of the previously added image actually requires us to push 1.2 GB worth of source code AGAIN, even when we only change a few bytes worth of source code:
New Dockerfile (.stack.php.git-commit-17c279b.Dockerfile
):
FROM project/project-web-src-php:git-commit-17c279b
COPY . /app
After change a few files, adding some text and code, then building and pushing:
$ docker build -f .stack.php.git-commit-17c279b.Dockerfile -t project/project-web-src-php:git-commit-17c279b-with-a-few-changes .
Sending build context to Docker daemon 1.225 GB
Step 0 : FROM project/project-web-src-php:git-commit-17c279b
---> 4dc643a45de3
Step 1 : COPY . /app
---> ecc7adc194c4
Removing intermediate container cb3e87c6cb7a
Successfully built ecc7adc194c4
$ docker tag -f project/project-web-src-php:git-commit-17c279b-with-a-few-changes tutum.co/project/project-web-src-php:git-commit-17c279b-with-a-few-changes
$ docker push tutum.co/project/project-web-src-php:git-commit-17c279b-with-a-few-changes
The push refers to a repository [tutum.co/project/project-web-src-php] (len: 1)
Sending image list
Pushing repository tutum.co/project/project-web-src-php (1 tags)
Image 1565e86129b8 already pushed, skipping
Image a604b236bcde already pushed, skipping
...
Image fe64bff23cf8 already pushed, skipping
Image 71156b357f2f already pushed, skipping
ecc7adc194c4: Pushing [==> ] 68.21 MB/1.225 GB
There is a workaround to achieve small layers as described on Updating docker images with small changes using commits which includes launching a rsync process within the image and then using docker commit to save the new contents as a new layer, however (as mentioned in that thread) this is unorthodox since the image is not built from a Dockerfile, and we prefer an orthodox solution that does not rely on docker commit.
Is there a way to add only changed files to this docker image as a new layer - without resorting to docker commit?
Docker version 1.8.3
Docker caching works per layer / instruction in the Dockerfile. In this case the files used in that layer (everything in the build-context (
.
)) are modified, so the layer needs to be rebuilt.If there's specific parts of the code that don't change often, you could consider to add those in a separate layer, or even move those to a "base image"
It may take some planning or restructuring for this to work, depending on your project, but may be worth doing.
Actually, the solution IS to use
COPY . /app
as the OP is doing, there is however an open bug causing this not to work as expected on most systemsThe only currently feasible workaround to this issue seems to be to use rsync to analyze the differences between the old and new images prior to pushing the new one, then use the changelog output to generate a tar-file containing the relevant changes which is subsequently COPY:ed to a new image layer.
This way, the layer sizes becomes a few bytes or kilobytes for smaller changes instead of 1.2 GB every time.
I put together documentation and scripts to help out with this over at https://github.com/neam/docker-diff-based-layers.
The end results are shown below:
Verify that basing the project images on the revision 1 image tag contents does not lead to desired outcome
Verify that subsequent
COPY . /app
commands re-adds all files in every layer instead of only the files that have changed:Output:
Even though we added/changed only a few bytes, all files are re-added and 16.78 MB is added to the total image size.
Also, the file(s) that we removed did not get removed.
Create an image with an optimized layer
Verify that the processed new image has smaller sized layers with the changes:
Output:
Verify that the processed new image contains the same contents as the original:
The output should indicate that there are no differences between the images/tags. Thus, the sample-project:revision-2-processed tag can now be pushed and deployed, leading to the same end result but without having to push an unnecessary 16.78M over the wire, leading to faster deploy cycles.
My solution: (idea from https://github.com/neam/docker-diff-based-layers !)