I'm completely new to Docker. I'm using it to train neural networks.
I've got a running container, executing a script for training a NN, and saving its weights in container's writable layer. Recently I've realized that this setup is incorrect (I haven't properly RTFM), and the NN weights will be lost after the training finishes.
I've read answers and recipes about volumes and persistent data storage. All of them express one idea: you must prepare that data storage in advance.
My container is already running. I understand that incorrect setup is my fault. Anyway, I do not want to lose results that will be obtained during this execution (that is now in progress). Is it possible?
One solution that have come to my mind is to open one more terminal and run watch -n 1000 docker commit <image id> tag:label
That is, commit a snapshot every 1000 seconds. However, weights, obtained on the last epoch are still in danger, since epoch durations differ and are not multiple of 1000.
Are there any more elegant solutions?
Additional information
Image for this container was created using the following Dockerfile:
FROM tensorflow-py3-gpu-keras
WORKDIR /root
COPY model4.py /root
COPY data_generator.py /root
COPY hyper_parameters.py /root
CMD python model4.py
I have manually created image tensorflow-py3-gpu-keras
from the latest tensorflow image, pulled from the DockerHub:
docker run tensorflow
Inside the container:
pip3 install keras
And docker commit
in another terminal.