Running shell script in parallel

I have a shell script which

shuffles a large text file (6 million rows and 6 columns)
sorts the file based the first column
outputs 1000 files

So the pseudocode looks like this

file1.sh 

#!/bin/bash
for i in $(seq 1 1000)
do

  Generating random numbers here , sorting  and outputting to file$i.txt  

done

Is there a way to run this shell script in parallel to make full use of multi-core CPUs?

At the moment, ./file1.sh executes in sequence 1 to 1000 runs and it is very slow.

Thanks for your help.

标签： linux bash shell unix parallel-processing

7条回答

我命由我不由天

2楼-- · 2019-01-08 06:29

Check out bash subshells, these can be used to run parts of a script in parallel.

I haven't tested this, but this could be a start:

#!/bin/bash
for i in $(seq 1 1000)
do
   ( Generating random numbers here , sorting  and outputting to file$i.txt ) &
   if (( $i % 10 == 0 )); then wait; fi # Limit to 10 concurrent subshells.
done
wait

0人赞添加讨论(0) 举报

我欲成王，谁敢阻挡

3楼-- · 2019-01-08 06:43

generating random numbers is easy. suppose u got a huge file like a shop database and u want to rewrite that file on some specific basis. My idea was to calculate number of cores, split file into how many cores, make a script.cfg file , split.sh and recombine.sh split.sh will split file in how many cores, clone script.cfg ( script that changes stuff in that huge files), clone script.cgf in how many cores, make them executable, search and replace in clones some variables that have to know what part of the file to process and run them in background when a clone is done generate a clone$core.ok file, so when all clones are done will tell to a loop to recombine partial results into a single one only when all .ok files are generated. it can be done with " wait" but i fancy my way

http://www.linux-romania.com/product.php?id_product=76 look at the bottom ,is partially translated in EN in this way i can procces 20000 articles with 16 columns in 2 minutes(quad core) instead of 8(single core) You have to care about CPU temperature, coz all cores are running at 100%

0人赞添加讨论(0) 举报

做个烂人

4楼-- · 2019-01-08 06:44

There is a simple, portable program that does just this for you: PPSS. PPSS automatically schedules jobs for you, by checking how many cores are available and launching another job every time another one just finished.

0人赞添加讨论(0) 举报

在下西门庆

5楼-- · 2019-01-08 06:50

There is a whole list of programs that can run jobs in parallel from a shell, which even includes comparisons between them, in the documentation for GNU parallel. There are many, many solutions out there. Another good news is that they are probably quite efficient at scheduling jobs so that all the cores/processors are kept busy at all times.

0人赞添加讨论(0) 举报

【Aperson】

6楼-- · 2019-01-08 06:51

IDLE_CPU=1
NCPU=$(nproc)

int_childs() {
    trap - INT
    while IFS=$'\n' read -r pid; do
        kill -s SIGINT -$pid
    done < <(jobs -p -r)
    kill -s SIGINT -$$
}

# cmds is array that hold commands
# the complex thing is display which will handle all cmd output
# and serialized it correctly

trap int_childs INT
{
    exec 2>&1
    set -m

    if [ $NCPU -gt $IDLE_CPU ]; then
        for cmd in "${cmds[@]}"; do
            $cmd &
            while [ $(jobs -pr |wc -l) -ge $((NCPU - IDLE_CPU)) ]; do
                wait -n
            done
        done
        wait

    else
        for cmd in "${cmds[@]}"; do
            $cmd
        done
    fi
} | display

0人赞添加讨论(0) 举报

Running shell script in parallel

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间