When I submit a small Tensorflow training as a single task, it launches additional threads. When I press Ctrl+C
and raise KeyboardInterrupt
my task is closed but underlying threads are not cleaned up and training continues.
Initially, I was thinking that this is a problem of Tensorflow (not cleaning threads), but after testing, I understand that a problem comes from the Dask side, that probably doesn't populate SIGTERM signal further to the task function. My question, how can I set Dask to populate SIGTERM signal to the running task?
Example of desired flow:
Local process -> Press Ctrl + C -> Dask scheduler -> Dask worker -> SIGTERM signal -> Running single task with Tensorflow training.
Thank you.
P.S If you need additional information, just ask.
Update:
Code example:
c = Client('<remote-scheduler>')
def task():
# tensorflow training
model = ...
model.fit(x_train, y_train)
training = c.submit(task)
training.result()
Now, during training, when I press Ctrl+C
task is canceled, but tensorflow threads/processes remains.
Update 2:
ps -f -u [username]
command output.
Dask cluster (1 scheduler, 1 worker, same server), no running tasks:
UID PID PPID C STIME TTY TIME CMD
vladysl+ 16547 1 0 12:40 ? 00:00:00 /lib/systemd/systemd --user
vladysl+ 16550 16547 0 12:40 ? 00:00:00 (sd-pam)
vladysl+ 16805 16311 0 12:40 ? 00:00:00 sshd: vladyslav@pts/45
vladysl+ 16811 16805 0 12:40 pts/45 00:00:00 -bash
vladysl+ 18946 16811 4 12:41 pts/45 00:00:24 /home/vladyslav/miniconda3/envs/py3.6/bin/python /home/vladyslav/miniconda3/envs/py3.6/bin/dask-scheduler --port 42001
vladysl+ 22284 22175 0 12:46 ? 00:00:00 sshd: vladyslav@pts/38
vladysl+ 22285 22284 0 12:46 pts/38 00:00:00 -bash
vladysl+ 23138 16811 1 12:48 pts/45 00:00:03 /home/vladyslav/miniconda3/envs/py3.6/bin/python /home/vladyslav/miniconda3/envs/py3.6/bin/dask-worker localhost:42001 --worker-port 420011 --memory-limit $
vladysl+ 23143 23138 0 12:48 pts/45 00:00:00 /home/vladyslav/miniconda3/envs/py3.6/bin/python -c from multiprocessing.semaphore_tracker import main;main(11)
vladysl+ 23145 23138 0 12:48 pts/45 00:00:00 /home/vladyslav/miniconda3/envs/py3.6/bin/python -c from multiprocessing.forkserver import main; main(15, 16, ['distributed'], **{'sys_path': ['/home/vlady$
vladysl+ 23151 23145 99 12:48 pts/45 00:03:48 /home/vladyslav/miniconda3/envs/py3.6/bin/python -c from multiprocessing.forkserver import main; main(15, 16, ['distributed'], **{'sys_path': ['/home/vlady$
vladysl+ 23536 23151 0 12:49 pts/45 00:00:00 /home/vladyslav/miniconda3/envs/py3.6/bin/python -c from multiprocessing.semaphore_tracker import main;main(25)
vladysl+ 26150 22285 0 12:51 pts/38 00:00:00 ps -f -u vladyslav
During task running:
UID PID PPID C STIME TTY TIME CMD
vladysl+ 16547 1 0 12:40 ? 00:00:00 /lib/systemd/systemd --user
vladysl+ 16811 16805 0 12:40 pts/45 00:00:00 -bash
vladysl+ 18946 16811 4 12:41 pts/45 00:00:30 /home/vladyslav/miniconda3/envs/py3.6/bin/python /home/vladyslav/miniconda3/envs/py3.6/bin/dask-scheduler --port 42001
vladysl+ 22285 22284 0 12:46 pts/38 00:00:00 -bash
vladysl+ 23138 16811 1 12:48 pts/45 00:00:06 /home/vladyslav/miniconda3/envs/py3.6/bin/python /home/vladyslav/miniconda3/envs/py3.6/bin/dask-worker localhost:42001 --worker-port 420011 --memory-limit $
vladysl+ 23143 23138 0 12:48 pts/45 00:00:00 /home/vladyslav/miniconda3/envs/py3.6/bin/python -c from multiprocessing.semaphore_tracker import main;main(11)
vladysl+ 23145 23138 0 12:48 pts/45 00:00:00 /home/vladyslav/miniconda3/envs/py3.6/bin/python -c from multiprocessing.forkserver import main; main(15, 16, ['distributed'], **{'sys_path': ['/home/vlady$
vladysl+ 23151 23145 99 12:48 pts/45 00:07:55 /home/vladyslav/miniconda3/envs/py3.6/bin/python -c from multiprocessing.forkserver import main; main(15, 16, ['distributed'], **{'sys_path': ['/home/vlady$
vladysl+ 23536 23151 0 12:49 pts/45 00:00:00 /home/vladyslav/miniconda3/envs/py3.6/bin/python -c from multiprocessing.semaphore_tracker import main;main(25)
vladysl+ 27079 22285 0 12:54 pts/38 00:00:00 ps -f -u vladyslav
After pressing Ctrl+C
, task canceled but tensorflow continues working:
UID PID PPID C STIME TTY TIME CMD
vladysl+ 16811 16805 0 12:40 pts/45 00:00:00 -bash
vladysl+ 18946 16811 4 12:41 pts/45 00:00:31 /home/vladyslav/miniconda3/envs/py3.6/bin/python /home/vladyslav/miniconda3/envs/py3.6/bin/dask-scheduler --port 42001
vladysl+ 22285 22284 0 12:46 pts/38 00:00:00 -bash
vladysl+ 23138 16811 1 12:48 pts/45 00:00:06 /home/vladyslav/miniconda3/envs/py3.6/bin/python /home/vladyslav/miniconda3/envs/py3.6/bin/dask-worker localhost:42001 --worker-port 420011 --memory-limit $
vladysl+ 23143 23138 0 12:48 pts/45 00:00:00 /home/vladyslav/miniconda3/envs/py3.6/bin/python -c from multiprocessing.semaphore_tracker import main;main(11)
vladysl+ 23145 23138 0 12:48 pts/45 00:00:00 /home/vladyslav/miniconda3/envs/py3.6/bin/python -c from multiprocessing.forkserver import main; main(15, 16, ['distributed'], **{'sys_path': ['/home/vlady$
vladysl+ 23151 23145 99 12:48 pts/45 00:09:32 /home/vladyslav/miniconda3/envs/py3.6/bin/python -c from multiprocessing.forkserver import main; main(15, 16, ['distributed'], **{'sys_path': ['/home/vlady$
vladysl+ 23536 23151 0 12:49 pts/45 00:00:00 /home/vladyslav/miniconda3/envs/py3.6/bin/python -c from multiprocessing.semaphore_tracker import main;main(25)
vladysl+ 27117 22285 0 12:54 pts/38 00:00:00 ps -f -u vladyslav
As you can see nothing new appears.
Dask does not support propagating signals from the client through to workers running tasks.