I have list of shell commands that I'd like to call. Up to four processes shall run at the same time.
My basic idea would be to send the commands to the shell until 4 commands are active.
The script then constantly checks the process count of all processes by looking for a common string e.g. "nohup scrapy crawl urlMonitor".
As soon as the process count drops below 4, the next command is sent to the shell until all command have finished.
Is there a way to do this with a shell script?
I suppose it would involve some kind of endless loop, and break condition as well as method to check for the active processes. Unfortunately I am not that good in shell scripting, so perhaps someone can guide me into the right direction?
nohup scrapy crawl urlMonitor -a slice=0 &
nohup scrapy crawl urlMonitor -a slice=1 &
nohup scrapy crawl urlMonitor -a slice=2 &
nohup scrapy crawl urlMonitor -a slice=3 &
nohup scrapy crawl urlMonitor -a slice=4 &
nohup scrapy crawl urlMonitor -a slice=5 &
nohup scrapy crawl urlMonitor -a slice=6 &
nohup scrapy crawl urlMonitor -a slice=7 &
nohup scrapy crawl urlMonitor -a slice=8 &
nohup scrapy crawl urlMonitor -a slice=9 &
nohup scrapy crawl urlMonitor -a slice=10 &
nohup scrapy crawl urlMonitor -a slice=11 &
nohup scrapy crawl urlMonitor -a slice=12 &
nohup scrapy crawl urlMonitor -a slice=13 &
nohup scrapy crawl urlMonitor -a slice=14 &
nohup scrapy crawl urlMonitor -a slice=15 &
nohup scrapy crawl urlMonitor -a slice=16 &
nohup scrapy crawl urlMonitor -a slice=17 &
nohup scrapy crawl urlMonitor -a slice=18 &
nohup scrapy crawl urlMonitor -a slice=19 &
nohup scrapy crawl urlMonitor -a slice=20 &
nohup scrapy crawl urlMonitor -a slice=21 &
nohup scrapy crawl urlMonitor -a slice=22 &
nohup scrapy crawl urlMonitor -a slice=23 &
nohup scrapy crawl urlMonitor -a slice=24 &
nohup scrapy crawl urlMonitor -a slice=25 &
nohup scrapy crawl urlMonitor -a slice=26 &
nohup scrapy crawl urlMonitor -a slice=27 &
nohup scrapy crawl urlMonitor -a slice=28 &
nohup scrapy crawl urlMonitor -a slice=29 &
nohup scrapy crawl urlMonitor -a slice=30 &
nohup scrapy crawl urlMonitor -a slice=31 &
nohup scrapy crawl urlMonitor -a slice=32 &
nohup scrapy crawl urlMonitor -a slice=33 &
nohup scrapy crawl urlMonitor -a slice=34 &
nohup scrapy crawl urlMonitor -a slice=35 &
nohup scrapy crawl urlMonitor -a slice=36 &
nohup scrapy crawl urlMonitor -a slice=37 &
nohup scrapy crawl urlMonitor -a slice=38 &
Here's a general method that will always ensure that there are less than 4 jobs before launching any other jobs (yet, there may be more than 4 jobs simultaneously if a line launches several jobs at once):
#!/bin/bash
max_nb_jobs=4
commands_file=$1
while IFS= read -r line; do
while :; do
mapfile -t jobs < <(jobs -pr)
((${#jobs[@]}<max_nb_jobs)) && break
wait -n
done
eval "$line"
done < "$commands_file"
wait
Use this script with your file as first argument.
How does it work? for each line line
read, we first ensure that there are less than max_nb_jobs
running by counting the number of running jobs (obtained from jobs -pr
). If there are more than max_nb_jobs
, we wait for the next job to terminate (wait -n
), and count again the number of running jobs. If there are less than max_nb_jobs
running, we eval
the line line
.
Update
Here's a similar script that doesn't use wait -n
. It seems to do the job all right (tested on Debian with Bash 4.2):
#!/bin/bash
set -m
max_nb_jobs=4
file_list=$1
sleep_jobs() {
# This function sleeps until there are less than $1 jobs running
# Make sure that you have set -m before using this function!
local n=$1 jobs
while mapfile -t jobs < <(jobs -pr) && ((${#jobs[@]}>=n)); do
coproc read
trap "echo >&${COPROC[1]}; trap '' SIGCHLD" SIGCHLD
wait $COPROC_PID
done
}
while IFS= read -r line; do
sleep_jobs $max_nb_jobs
eval "$line"
done < "$file_list"
wait
If you want 4 at a time continuously running, try something like:
max_procs=4
active_procs=0
for proc_num in {0..38}; do
nohup your_cmd_here &
# If we have more than max procs running, wait for one to finish
if ((active_procs++ >= max_procs)); then
wait -n
((active_procs--))
fi
done
# Wait for all remaining procs to finish
wait
This is a variation on sputnick's answer that keeps up to max_procs
running at the same time. As soon as one finishes, it kicks off the next one. The wait -n
command waits for the next process to finish instead of waiting for all of them to finish.
You could do this easily with GNU parallel or even just xargs. To wit:
declare -i i=0
while sleep 1; do
printf 'slice=%d\n' $((i++))
done | xargs -n1 -P3 nohup scrapy crawl urlMonitor -a
The while
loop will run forever; if there's an actual hard limit you know of you can just do a for
loop like:
for i in {0..100}…
Also, the sleep 1
is helpful because it lets the shell handle signals more effectively.
Try doing this :
for i in {0..38}; do
nohup scrapy crawl urlMonitor -a slice=$i & _pid=$!
((++i%4==0)) && wait $_pid
done
help wait
:
wait: wait [-n] [id ...]
Wait for job completion and return exit status.
Waits for each process identified by an ID, which may be a process ID or a
job specification, and reports its termination status. If ID is not
given, waits for all currently active child processes, and the return
status is zero. If ID is a a job specification, waits for all processes
in that job's pipeline.
If the -n option is supplied, waits for the next job to terminate and
returns its exit status.
Exit Status:
Returns the status of the last ID; fails if ID is invalid or an invalid
option is given.