I have a command I want to run on all of the files of a folder, and the command's syntax looks like this:
tophat -o <output_file> <input_file>
What I would like to do is a script that loops over all the files in an arbitrary folder and also uses the input file names to create similar, but different, output file names. The file names looks like this:
input name desired output name
path/to/sample1.fastq path/to/sample1.bam
path/to/sample2.fastq path/to/sample2.bam
Getting the input to work seems simple enough:
for f in *.fastq
do
tophat -o <output_file> $f
done
I tried using output=${f,.fastq,.bam}
and using that as the output parameter, but that doesn't work. All I get is an error: line 3: ${f,.fastq,.bam}: bad substitution
. Is this the way to do what I want, or should I do something else? If it's the correct way, what am I doing wrong?
[EDIT]:
Thanks for all the answers! A bonus question, though... What if I have files named like this, instead:
path/to/sample1_1.fastq
path/to/sample1_2.fastq
path/to/sample2_1.fastq
path/to/sample2_2.fastq
...
... where I can have an arbitrary number of samples (sampleX
), but all of them have two files associated with them (_1
and _2
). The command now looks like this:
tophat -o <output_file> <input_1> <input_2>
So, there's still just the one output, for which I could do something like "${f/_[1-2].fastq/.bam}"
, but I'm unsure how to get a loop that only iterates once over every sampleX
at the same time as taking both the associated files... Ideas?
[EDIT #2]:
So, this is the final script that did the trick!
for f in *_1.fastq
do
tophat -o "${f/_1.fastq/.bam}" $f "${f/_1.fastq/_2.fasq}"
done
You can use:
Testing:
Alternative to anubhava's concise solution,
Not an answer but a suggestion: as a bioinformatician, you shoud use GNU make and its option
-j (number of parallel jobs)
. The Makefile would be: