I have a pyhton script that on a loop
- Downloads video chunks from AWS S3 to /filename
- Sorts files in order and concats them.
- Uploads entire processed video file to AWS S3
- Delete folder /filename
Then continues on a loop until the AWS SQS queue is empty.
Scripts works great! I have run it for months. The hard drive space varies but never gets about 5%, depending on size of the video.
I decided to put this script in a docker container and run docker-compose so I could run a bunch of them at a time.
The problem is the hard drive fills up! I know with 5 running the space on the disk will be hire, but when I'm done processing the file get delete.
But with docker, seems to be a cache or something. I exec into each container and they are running fine. Delete old files and all.
No clue what the difference between in a docker container and running as a service would have the impact on the HD.
Any direction would be great.
To add to this. When I "rm" the docker containers the hard drive space frees up. I run a docker ps -s and the space on the containers is not crazy. Just seems like when you "rm" a file inside the docker container it never really rm it.
If you're downloading the image to a directory NOT volumed mapped from the host, the docker container will not release the used disk space until the container is removed--anything done in the container is ephemeral, but the HOST doesn't know the state of what's going on inside the container.
In this sense it's a lot like a virtual machine image, backed by a file that just grows as needed, but never shrinks. Docker has a directory for a running container tracking changes. On the host you can find the files backing the running container in
/var/lib/docker/containers/<id>
If you need your containers to share disk space, I'd recommend you map a shared volume from the host into each docker container images to share.
Try the following
The above would run the ubuntu image in terminal interactive mode and mounting the host's directory
/host/dir
inside the running container. Anything the container writes to/container/dir
will appear in the hosts/host/dir
and any other containers mounting it will see the changes as well.Just remember anything done in the shared volume is seen by all containers that mount it, so be careful when adding and deleting files/directories from it!
I would suggest you to use volumes, and mount these volumes in your containers. Changes on volumes are instantaneous, as opposed to changes made to the containers filesystem (which is not really removed until you delete the container).
Have a look at the docs here