We are currently running a private registry on one server hosting all our images on it. If the server crash, we basically loose all our images. We would like to find a way to enable high availability on our images. An easy solution I see would be to have a registry instance per server. A load balancer would redirect(Round robin) the traffic to ones of the registry instances available. Registry instances would share the same network data drive(NFS) to store the images.
Do you see any problems with this solution ? i.e: If a user push an image on an instance, and another push on another ( Load balancer round robin decision), would it create any lock files on the NFS ?
Thanks for your feedback
registry need to cache some meta data, using inmemory by default. You can change cache configuration to use redis.
https://docs.docker.com/registry/configuration/#cache
example config:
You should change storage method for s3 or any other global storage. I run two registry for high availability.
There is some information on this on the docker-registry website. In short, it seems designed to support multiple registries talking to the same data-store so you shouldn't see any problems.
If reliability is a real issue for you, it might be wise to look at one of the commercial offerings e.g. enterprise Hub or the CoreOS Enterprise Registry. (Although these seem to stress security and access controls rather than HA).
It's possible to back the registry with S3 as described here. It's worth running the registry in a container so you can instantly launch another in the event of a catastrophic host/data centre failure. GCloud and OpenStack are also supported by the registry.
If you are concerned with data loss add redundancy to your persistence and ensure regular backups. You should also ensure your builds are idempotent so you can rebuild images if absolutely necessary.