How do people deal with persistent storage for your Docker containers?
I am currently using this approach: build the image, e.g. for PostgreSQL, and then start the container with
docker run --volumes-from c0dbc34fd631 -d app_name/postgres
IMHO, that has the drawback, that I must not ever (by accident) delete container "c0dbc34fd631".
Another idea would be to mount host volumes "-v" into the container, however, the userid within the container does not necessarily match the userid from the host, and then permissions might be messed up.
Note: Instead of --volumes-from 'cryptic_id'
you can also use --volumes-from my-data-container
where my-data-container
is a name you assigned to a data-only container, e.g. docker run --name my-data-container ...
(see the accepted answer)
@tommasop's answer is good, and explains some of the mechanics of using data-only containers. But as someone who initially thought that data containers were silly when one could just bind mount a volume to the host (as suggested by several other answers), but now realizes that in fact data-only containers are pretty neat, I can suggest my own blog post on this topic: Why Docker Data Containers (Volumes!) are Good
See also: my answer to the question "What is the (best) way to manage permissions for Docker shared volumes?" for an example of how to use data containers to avoid problems like permissions and uid/gid mapping with the host.
To address one of the OP's original concerns: that the data container must not be deleted. Even if the data container is deleted, the data itself will not be lost as long as any container has a reference to that volume i.e. any container that mounted the volume via
--volumes-from
. So unless all the related containers are stopped and deleted (one could consider this the equivalent of an accidentalrm -fr /
) the data is safe. You can always recreate the data container by doing--volumes-from
any container that has a reference to that volume.As always, make backups though!
UPDATE: Docker now has volumes that can be managed independently of containers, which further makes this easier to manage.
As of Docker Compose 1.6, there is now improved support for data volumes in Docker Compose. The following compose file will create a data image which will persist between restarts (or even removal) of parent containers:
Here is the blog announcement: Compose 1.6: New Compose file for defining networks and volumes
Here's an example compose file:
As far as I can understand: This will create a data volume container (
db_data
) which will persist between restarts.If you run:
docker volume ls
you should see your volume listed:You can get some more details about the data volume:
Some testing:
Notes:
You can also specify various drivers in the
volumes
block. For example, You could specify the Flocker driver for db_data:Disclaimer: This approach is promising, and I'm using it successfully in a development environment. I would be apprehensive to use this in production just yet!
I recently wrote about a potential solution and an application demonstrating the technique. I find it to be pretty efficient during development and in production. Hope it helps or sparks some ideas.
Repo: https://github.com/LevInteractive/docker-nodejs-example
Article: http://lev-interactive.com/2015/03/30/docker-load-balanced-mongodb-persistence/
My solution is to get use of the new
docker cp
, which is now able to copy data out from containers, not matter if it's running or not and share a host volume to the exact same location where the database application is creating its database files inside the container. This double solution works without a data-only container, straight from the original database container.So my systemd init script is taking the job of backuping the database into an archive on the host. I placed a timestamp in the filename to never rewrite a file.
It's doing it on the ExecStartPre:
And it is doing the same thing on ExecStopPost too:
Plus I exposed a folder from the host as a volume to the exact same location where the database is stored:
It works great on my VM (I building a LEMP stack for myself): https://github.com/DJviolin/LEMP
But I just don't know if is it a "bulletproof" solution when your life depends on it actually (for example, webshop with transactions in any possible miliseconds)?
At 20 min 20 secs from this official Docker keynote video, the presenter does the same thing with the database:
Getting Started with Docker
When using Docker Compose, simply attach a named volume, for example,
In case it is not clear from update 5 of the selected answer, as of Docker 1.9, you can create volumes that can exist without being associated with a specific container, thus making the "data-only container" pattern obsolete.
See Data-only containers obsolete with docker 1.9.0? #17798.
I think the Docker maintainers realized the data-only container pattern was a bit of a design smell and decided to make volumes a separate entity that can exist without an associated container.