Parallel processing from a command queue on Linux

I have a list/queue of 200 commands that I need to run in a shell on a Linux server.

I only want to have a maximum of 10 processes running (from the queue) at once. Some processes will take a few seconds to complete, other processes will take much longer.

When a process finishes I want the next command to be "popped" from the queue and executed.

Does anyone have code to solve this problem?

Further elaboration:

There's 200 pieces of work that need to be done, in a queue of some sort. I want to have at most 10 pieces of work going on at once. When a thread finishes a piece of work it should ask the queue for the next piece of work. If there's no more work in the queue, the thread should die. When all the threads have died it means all the work has been done.

The actual problem I'm trying to solve is using imapsync to synchronize 200 mailboxes from an old mail server to a new mail server. Some users have large mailboxes and take a long time tto sync, others have very small mailboxes and sync quickly.

标签： python ruby bash shell parallel-processing

12条回答

混吃等死

2楼-- · 2019-01-21 03:06

Well, if they are largely independent of each other, I'd think in terms of:

Initialize an array of jobs pending (queue, ...) - 200 entries
Initialize an array of jobs running - empty

while (jobs still pending and queue of jobs running still has space)
    take a job off the pending queue
    launch it in background
    if (queue of jobs running is full)
        wait for a job to finish
        remove from jobs running queue
while (queue of jobs is not empty)
    wait for job to finish
    remove from jobs running queue

Note that the tail test in the main loop means that if the 'jobs running queue' has space when the while loop iterates - preventing premature termination of the loop. I think the logic is sound.

I can see how to do that in C fairly easily - it wouldn't be all that hard in Perl, either (and therefore not too hard in the other scripting languages - Python, Ruby, Tcl, etc). I'm not at all sure I'd want to do it in shell - the wait command in shell waits for all children to terminate, rather than for some child to terminate.

0人赞添加讨论(0) 举报

We Are One

3楼-- · 2019-01-21 03:08

Simple function in zsh to parallelize jobs in not more than 4 subshells, using lock files in /tmp.

The only non trivial part are the glob flags in the first test:

#q: enable filename globbing in a test
[4]: returns the 4th result only
N: ignore error on empty result

It should be easy to convert it to posix, though it would be a bit more verbose.

Do not forget to escape any quotes in the jobs with \".

#!/bin/zsh

setopt extendedglob

para() {
    lock=/tmp/para_$$_$((paracnt++))
    # sleep as long as the 4th lock file exists
    until [[ -z /tmp/para_$$_*(#q[4]N) ]] { sleep 0.1 }
    # Launch the job in a subshell
    ( touch $lock ; eval $* ; rm $lock ) &
    # Wait for subshell start and lock creation
    until [[ -f $lock ]] { sleep 0.001 }
}

para "print A0; sleep 1; print Z0"
para "print A1; sleep 2; print Z1"
para "print A2; sleep 3; print Z2"
para "print A3; sleep 4; print Z3"
para "print A4; sleep 3; print Z4"
para "print A5; sleep 2; print Z5"

# wait for all subshells to terminate
wait

0人赞添加讨论(0) 举报

别忘想泡老子

4楼-- · 2019-01-21 03:10

Parallel is made exatcly for this purpose.

cat userlist | parallel imapsync

One of the beauties of Parallel compared to other solutions is that it makes sure output is not mixed. Doing traceroute in Parallel works fine for example:

(echo foss.org.my; echo www.debian.org; echo www.freenetproject.org) | parallel traceroute

0人赞添加讨论(0) 举报

时光不老，我们不散

5楼-- · 2019-01-21 03:10

Python's multiprocessing module would seem to fit your issue nicely. It's a high-level package that supports threading by process.

0人赞添加讨论(0) 举报

我欲成王，谁敢阻挡

6楼-- · 2019-01-21 03:14

On the shell, xargs can be used to queue parallel command processing. For example, for having always 3 sleeps in parallel, sleeping for 1 second each, and executing 10 sleeps in total do

echo {1..10} | xargs -d ' ' -n1 -P3 sh -c 'sleep 1s' _

And it would sleep for 4 seconds in total. If you have a list of names, and want to pass the names to commands executed, again executing 3 commands in parallel, do

cat names | xargs -n1 -P3 process_name

Would execute the command process_name alice, process_name bob and so on.

0人赞添加讨论(0) 举报

Ridiculous、

7楼-- · 2019-01-21 03:17

If you are going to use Python, I recommend using Twisted for this.

Specifically Twisted Runner.

0人赞添加讨论(0) 举报

1 2 下一页

Parallel processing from a command queue on Linux

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间