How to make gnu-parallel split multiple input file

2019-08-06 13:56发布

问题:

I have a script which takes three arguments and is run like this:

myscript.sh input1.fa input2.fa out.txt

The script reads one line each from input1.fa and input2.fa, does some comparison, and writes the result to out.txt. The two inputs are required to have the same number of lines, and out.txt will also have the same number of lines after the script finishes.

Is it possible to parallelize this using GNU parallel?

I do not care that the output has a different order from the inputs, but I do need to compare the ith line of input1.fa with the ith line of input2.fa. Also, it is acceptable if I get multiple output files (like output{#}) instead of one -- I'll just cat them together.

I found this topic, but the answer wasn't quite what I wanted. I know I can split the two input files and process them in parallel in pairs using xargs, but would like to do this in one line if possible...

回答1:

If you can change myscript.sh, so it reads from a pipe and writes to a pipe you can do:

paste input1.fa input2.fa | parallel --pipe myscript.sh > out.txt

So in myscript you will need to read from STDIN and split on TAB to get the input from input1 and input2.