Within a certain directory I have many directories containing a bunch of text files. I’m trying to write a script that concatenates only those files in each directory that have the string ‘R1’ in their filename into one file within that specific directory, and those that have ‘R2’ in another . This is what I wrote but it’s not working.
#!/bin/bash
for f in */*.fastq; do
if grep 'R1' $f ; then
cat "$f" >> R1.fastq
fi
if grep 'R2' $f ; then
cat "$f" >> R2.fastq
fi
done
I get no errors and the files are created as intended but they are empty files. Can anyone tell me what I’m doing wrong?
Thank you all for the fast and detailed responses! I think I wasn't very clear in my question, but I need the script to only concatenate the files within each specific directory so that each directory has a new file ( R1 and R2). I tried doing
cat /*R1*.fastq >*/R1.fastq
but it gave me an ambiguous redirect error. I also tried Charles Duffy's for loop but looping through the directories and doing a nested loop to run though each file within a directory like so
for f in */; do
for d in "$f"/*.fastq;do
case "$d" in
*R1*) cat "$d" >&3
*R2*) cat "$d" >&4
esac
done 3>R1.fastq 4>R2.fastq
done
but it was giving an unexpected token error regarding ')'.
Sorry in advance if I'm missing something elementary, I'm still very new to bash.
Your
grep
is searching the file contents instead of file name. You could rewrite it this way:A Note To The Reader
Please review edit history on the question in considering this answer; several parts have been made less relevant by question edits.
One
cat
Per Output FileFor the purpose at hand, you can probably just let shell globbing do all the work (if
R1
orR2
will be in the filenames, as opposed to the directory names):One
find
Per Output FileIf it's a really large number of files, by contrast, you might need
find
:...this is because of the OS-dependent limit on command-line length; the
find
command given above will put as many arguments onto eachcat
command as possible for efficiency, but will still split them up into multiple invocations where otherwise the limit would be exceeded.Iterate-And-Test
If you really do want to iterate over everything, and then test the names, consider a
case
statement for the job, which is much more efficient than usinggrep
to check just one line:Note the use of file descriptors 3 and 4 to write to
R1.fastq
andR2.fastq
respectively -- that way we're only opening the output files once (and thus truncating them exactly once) when thefor
loop starts, and reusing those file descriptors rather than re-opening the output files at the beginning of eachcat
. (That said, runningcat
once per file -- whichfind -exec {} +
avoids -- is probably more overhead on balance).Operating Per-Directory
All of the above can be updated to work on a per-directory basis quite trivially. For example:
There are only two significant changes:
-mindepth
, to ensure that our input files only come from subdirectories.R1.fastq
andR2.fastq
from our input files, so we never try to use the same file as both input and output. This is a consequence of the prior change: Previously, our output files couldn't be considered as input because they didn't meet the minimum depth.Find in a forloop might suit this: