The setting:
I have some hundred files, named something like input0.dat
, input1.dat
, ..., input150.dat
, which I need to process using some command cmd
(which basically merges the contents of all files). The cmd
takes as first option the output filename and then a list of all input filenames:
./cmd output.dat input1.dat input2.dat [...] input150.dat
The problem:
The problem is that the script can only handle like 10 files or so due to memory issues (don't blame me for that). Thus, instead of using the bash
wildcard extension like
./cmd output.dat *dat
I need to do something like
./cmd temp_output0.dat file0.dat file1.dat [...] file9.dat
[...]
./cmd temp_outputN.dat fileN0.dat fileN1.dat [...] fileN9.dat
Afterwards I can merge the temporary outputs.
./cmd output.dat output0.dat [...] outputN.dat
How do I script this efficiently in bash
?
I tried, without success, e.g.
for filename in `echo *dat | xargs -n 3`; do [...]; done
The problem is that this again processes all files at once, because the output lines of xargs
get concatenated.
EDIT: Note that I need to specify an output filename as first command line argument when calling cmd
!
I know that this question was answered and accepted a long time ago, but I find that there is a more simple solution than those offered so far.
For more fine grained control, or to manipulate your string further, use the following form (substitute bash to your liking):
To parallelize the output (say, on 2 threads):
NOTE: This will not work for files that have spaces in them.
Try the following, it should work for you:
EDIT: In response to your comment:
That would send no more than three files at a time to
./cmd
, while going over all file fromfile00.dat
tofile99.dat
, and having 10 different output files,output1.dat
tooutput9.dat
.You can do:
You need to use a fifo to keep the
i
variable value, as well as for the final concatenation set of files.If you want, you can background the inside invocation of
./cmd
, put await
before the last invocation of cmd:update If you want to avoid using a fifo entirely, you can use process substitution to emulate it, so rewriting the first one as:
Again avoiding piping into the while, but reading from a redirection to keep the
opfiles
variable after the while loop.