My goal is to write a docker image that runs a python script that produces a lot of csv files full of random numbers, which once finished, are to be written to an external storage drive, after which the container quits. Assume that it writes so many of these csv files that they cannot be stored into memory.
What I'm worried about are the cases where the container encounters an error and quits (or is made to quit by the user), and then it creates a bunch of garbage files that have to be manually cleaned.
First solution is to mount a fast drive (like an SSD) directly into the container and write to it. After it is done, it transfers the data from this SSD to the external storage drive. The bad thing about this one is that if the container quits unexpectedly, it will leave garbage on the SSD.
Second solution was to create a volume using the SSD, start a container with this volume, and then pretty much do the same as the first solution. In this case, if the container dies unexpectedly, what happens to the volume? Will it auto quit as well? Can it be configured to auto quit thereby deleting any garbage that was created?
In case you're curious, the final goal is to use these containers with some sort of orchestration system.
If you write your intermediate files into the container filesystem, rather than to a persistent volume, then docker can do all the hard work for you. Simply run your container with the remove option (
--rm
). E.g. if you did:Then your application can write to anywhere other than /final/result, and upon exit of the container (successful or any other error condition), the container will be automatically deleted by the docker daemon. On successful completion of your task, write your content to /final/result to be persisted after the container exits. This path is completely made up and you'll likely want to adjust this for your usage.
Note that if you are running on a desktop environment (mac/windows) and not native linux, then there is an issue with the VM disk expanding with usage and not shrinking as files are deleted. This is the nature of VM filesystems that allocate upon usage and outside of docker's control. In that scenario, you'd likely want the entire setup running with an external volume and configuring your entrypoint to cleanup any temporary files left over from the last run of your container.
Note that you may configure your ENTRYPOINT Python script to automatically perform the necessary cleanup.
To give you some guidelines/examples of that approach:
trap
) in this SO answer.Note that beyond the graceful termination of your containers, you may want to setup a
restart
policy, such asalways
orunless-stopped
. See for example this codeship blog article.Albeit the two solutions you present wouldn't be necessary to address the main question of this thread, I have to mention that in general, it is a best practice to use volumes in production, rather than using a mere bind-mount. But of course using any of these two approches (
-v volume-name:/path
or a bind-mount-v /path:/path
) is way better than not using the-v
option at all, because I recall that writing data directly in the writable layer of the container implies this data will be lost if the container is recreated from the image.