How Docker calculates the hash of each layer? Is i

2020-08-09 08:40发布

问题:

I tried to find this information around the Docker official docs, but had no success.

Which pieces of information does Docker take into account when calculating the hash of each commit/layer?

It's pretty obvious that the line in the Dockerfile is part of the hash and, of course, the parent commit hash. But is something else take into account when calculating this hash?

Concrete use case: Let's suppose I have two devs in different machines, at different points in time (and because of that, different docker daemons and different caches) running $ docker build ... against the same Dockerfile. The FROM ... directive will give them the same starting point, but will the resulting hash of each operation result on the same hash? Is it deterministic?

回答1:

Thanks @thaJeztah. Answer is in https://gist.github.com/aaronlehmann/b42a2eaf633fc949f93b#id-definitions-and-calculations

  1. layer.DiffID: ID for an individual layer

    Calculation: DiffID = SHA256hex(uncompressed layer tar data)

  2. layer.ChainID: ID for a layer and its parents. This ID uniquely identifies a filesystem composed of a set of layers.

    Calculation:

    • For bottom layer: ChainID(layer0) = DiffID(layer0)
    • For other layers: ChainID(layerN) = SHA256hex(ChainID(layerN-1) + " " + DiffID(layerN))
  3. image.ID: ID for an image. Since the image configuration references the layers the image uses, this ID incorporates the filesystem data and the rest of the image configuration.

    Calculation: SHA256hex(imageConfigJSON)