Gitlab docker executor - cache image after before_

2020-07-05 05:58发布

问题:

In gitlab-ci there's an option in the .gitlab-ci.yml file to execute commands before any of the actual script runs, called before_script. .gitlab-ci.yml examples illustrate installing ancillary programs here. However, what I've noticed is that these changes are not cached in Docker when using a docker executor. I had naively assumed that after running these commands, docker would cache the image, so for the next run or test, docker would just load the cached image produced after before_script. This would drastically speed up builds.

As an example, my .gitlab-ci.yml looks a little like:

image: ubuntu

before_script:
    - apt-get update -qq && apt-get install -yqq make ...

build:
    script:
        - cd project && make

A possible solution is to go to the runner machine and create a docker image that can build my software without any other installation and then reference it in the image section of the yaml file. The downside of this is that whenever I want to add a dependency, I need to log in to the runner machine and update the image before builds will succeed. It would be much nicer if I just had to add the dependency to to the end of apt-get install and have docker / gitlab-ci handle the appropriate caching.

There is also a cache command in .gitlab-ci.yml, which I tried setting to untracked: true, which I thought would cache everything that wasn't a byproduct of my project, but it didn't seem to have any effect.

Is there any way to get the behavior I desire?

回答1:

You can add a stage to build the image in first place. If the image doesn't have any change, the stage will be very short, under 1 second.

You can use that image on the following stages, speeding up the whole process.

This is an example of a .gitlab-ci.yml:

stages:
  - build_test_image
  - test

build_test:
  stage: build_test_image
  script:
    - docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN $CI_REGISTRY
    - docker build -t $CI_REGISTRY_IMAGE:test -f dockerfiles/test/Dockerfile .
    - docker push $CI_REGISTRY_IMAGE:test
  tags:
    - docker_build

test_syntax:
  image: $CI_REGISTRY_IMAGE:test
  stage: test
  script:
    - pip install flake8
    - flake8 --ignore=E501,E265 app/

Look at the tag docker_build. That tag is used to force the execution of the stage on the runner which has that tag. The executor for that runner is shell, and it's used only to build Docker images. So, the host where the runner lives should have installed Docker Engine. I found this solution suits better my needs than docker in docker and another solutions.

Also, I'm using a private registry, that's why I'm using $CI_REGISTRY* variables, but you can use DockerHub without need to specify the registry. The problem would be to authenticate on DockerHub, though.



回答2:

The way that I handle this is that I have custom images on Docker Hub for each of our projects and reference them from .gitlab-ci.yml. If I need a new dependency, I edit the Dockerfile used to create the initial image, re-build the image, and tag it using a specific tag and push to Docker Hub.

cat "RUN apt-get install gcc" >> Dockerfile
ID=$(docker build)
docker tag $ID ACCOUNT/gitlab_ci_image:gcc
docker push ACCOUNT/gitlab_ci_image

Then I update the .gitlab-ci.yml file to point to that specific version of the image.

image: ACCOUNT/gitlab_ci_image:gcc

build:
    script:
        - cd project && make

This allows me to have different dependencies depending on which commit I am attempting to test (as the gitlab-ci.yml file within that commit tells the runner which to use). It also prevents the need to install the dependencies every time a test is run on a particular runner as the runner will re-use the same image as long as it doesn't change.

The other nice thing, is that with the images hosted on Docker Hub, if the runner needs a specific tag that it doesn't have locally, it will go grab the correct one automatically so you can have 10 runners and only maintain a single image and this maintenance can be done on your own workstation or any machine.

I personally think that this is a much better solution than attempting to cache anything within a runner's image. This is particularly true when you create a new branch to test your code on a newer version of a dependency. If you had caching you would have issues having different testing environments for your stable and dev branches. Also in my opinion, tests should be run within as clean an environment as possible and this setup accomplishes that.