Multi-threaded BASH programming - generalized meth

2019-04-11 03:10发布

问题:

Ok, I was running POV-Ray on all the demos, but POV's still single-threaded and wouldn't utilize more than one core. So, I started thinking about a solution in BASH.

I wrote a general function that takes a list of commands and runs them in the designated number of sub-shells. This actually works but I don't like the way it handles accessing the next command in a thread-safe multi-process way:

  • It takes, as an argument, a file with commands (1 per line),
  • To get the "next" command, each process ("thread") will:
    • Waits until it can create a lock file, with: ln $CMDFILE $LOCKFILE
    • Read the command from the file,
    • Modifies $CMDFILE by removing the first line,
    • Removes the $LOCKFILE.

Is there a cleaner way to do this? I couldn't get the sub-shells to read a single line from a FIFO correctly.


Incidentally, the point of this is to enhance what I can do on a BASH command line, and not to find non-bash solutions. I tend to perform a lot of complicated tasks from the command line and want another tool in the toolbox.

Meanwhile, here's the function that handles getting the next line from the file. As you can see, it modifies an on-disk file each time it reads/removes a line. That's what seems hackish, but I'm not coming up with anything better, since FIFO's didn't work w/o setvbuf() in bash.

#
# Get/remove the first line from FILE, using LOCK as a semaphore (with
# short sleep for collisions).  Returns the text on standard output,
# returns zero on success, non-zero when file is empty.
#
parallel__nextLine() 
{
  local line rest file=$1 lock=$2

  # Wait for lock...
  until ln "${file}" "${lock}" 2>/dev/null
  do sleep 1
     [ -s "${file}" ] || return $?
  done

  # Open, read one "line" save "rest" back to the file:
  exec 3<"$file"
  read line <&3 ; rest=$(cat<&3)
  exec 3<&-

  # After last line, make sure file is empty:
  ( [ -z "$rest" ] || echo "$rest" ) > "${file}"

  # Remove lock and 'return' the line read:
  rm -f "${lock}"
  [ -n "$line" ] && echo "$line"
}

回答1:

#adjust these as required
args_per_proc=1 #1 is fine for long running tasks
procs_in_parallel=4

xargs -n$args_per_proc -P$procs_in_parallel povray < list

Note the nproc command coming soon to coreutils will auto determine the number of available processing units which can then be passed to -P



回答2:

If you need real thread safety, I would recommend to migrate to a better scripting system.

With python, for example, you can create real threads with safe synchronization using semaphores/queues.



回答3:

sorry to bump this after so long, but I pieced together a fairly good solution for this IMO
It doesnt work perfectly, but it will limit the script to a certain number of child tasks running, and then wait for all the rest at the end.

#!/bin/bash

pids=()
thread() {
  local this
  while [ ${#} -gt 6 ]; do
    this=${1}
    wait "$this"
    shift
  done
  pids=($1 $2 $3 $4 $5 $6)
}
for i in 1 2 3 4 5 6 7 8 9 10
do
  sleep 5 &
  pids=( ${pids[@]-} $(echo $!) )
  thread ${pids[@]}
done
for pid in ${pids[@]}
do
  wait "$pid"
done

it seems to work great for what I'm doing (handling parallel uploading of a bunch of files at once) and keeps it from breaking my server, while still making sure all the files get uploaded before it finishes the script



回答4:

I believe you're actually forking processes here, and not threading. I would recommend looking for threading support in a different scripting language like perl, python, or ruby.