How can I perform an operation for each item listed by grep individually?
Background:
I use grep to list all files containing a certain pattern:
grep -l '<pattern>' directory/*.extension1
I want to delete all listed files but also all files having the same file name but a different extension: .extension2
.
I tried using the pipe, but it seems to take the output of grep as a whole.
In find there is the -exec
option, but grep has nothing like that.
If I understand your specification, you want:
grep --null -l '<pattern>' directory/*.extension1 | \
xargs -n 1 -0 -I{} bash -c 'rm "$1" "${1%.*}.extension2"' -- {}
This is essentially the same as what @triplee's comment describes, except that it's newline-safe.
What's going on here?
grep
with --null
will return output delimited with nulls instead of newline. Since file names can have newlines in them delimiting with newline makes it impossible to parse the output of grep
safely, but null is not a valid character in a file name and thus makes a nice delimiter.
xargs
will take a stream of newline-delimited items and execute a given command, passing as many of those items (one as each parameter) to a given command (or to echo
if no command is given). Thus if you said:
printf 'one\ntwo three \nfour\n' | xargs echo
xargs
would execute echo one 'two three' four
. This is not safe for file names because, again, file names might contain embedded newlines.
The -0
switch to xargs
changes it from looking for a newline delimiter to a null delimiter. This makes it match the output we got from grep --null
and makes it safe for processing a list of file names.
Normally xargs
simply appends the input to the end of a command. The -I
switch to xargs
changes this to substitution the specified replacement string with the input. To get the idea try this experiment:
printf 'one\ntwo three \nfour\n' | xargs -I{} echo foo {} bar
And note the difference from the earlier printf | xargs
command.
In the case of my solution the command I execute is bash
, to which I pass -c
. The -c
switch causes bash to execute the commands in the following argument (and then terminate) instead of starting an interactive shell. The next block 'rm "$1" "${1%.*}.extension2"'
is the first argument to -c
and is the script which will be executed by bash
. Any arguments following the script argument to -c
are assigned as the arguments to the script. This, if I were to say:
bash -c 'echo $0' "Hello, world"
Then Hello, world
would be assigned to $0
(the first argument to the script) and inside the script I could echo
it back.
Since $0
is normally reserved for the script name I pass a dummy value (in this case --
) as the first argument and, then, in place of the second argument I write {}
, which is the replacement string I specified for xargs
. This will be replaced by xargs
with each file name parsed from grep
's output before bash
is executed.
The mini shell script might look complicated but it's rather trivial. First, the entire script is single-quoted to prevent the calling shell from interpreting it. Inside the script I invoke rm
and pass it two file names to remove: the $1
argument, which was the file name passed when the replacement string was substituted above, and ${1%.*}.extension2
. This latter is a parameter substitution on the $1
variable. The important part is %.*
which says
%
"Match from the end of the variable and remove the shortest string matching the pattern.
.*
The pattern is a single period followed by anything.
This effectively strips the extension, if any, from the file name. You can observe the effect yourself:
foo='my file.txt'
bar='this.is.a.file.txt'
baz='no extension'
printf '%s\n'"${foo%.*}" "${bar%.*}" "${baz%.*}"
Since the extension has been stripped I concatenate the desired alternate extension .extension2
to the stripped file name to obtain the alternate file name.
If this does what you want, pipe the output through /bin/sh.
grep -l 'RE' folder/*.ext1 | sed 's/\(.*\).ext1/rm "&" "\1.ext2"/'
Or if sed makes you itchy:
grep -l 'RE' folder/*.ext1 | while read file; do
echo rm "$file" "${file%.ext1}.ext2"
done
Remove echo
if the output looks like the commands you want to run.
But you can do this with find
as well:
find /path/to/start -name \*.ext1 -exec grep -q 'RE' {} \; -print | ...
where ...
is either the sed script or the three lines from while
to done
.
The idea here is that find
will ... well, "find" things based on the qualifiers you give it -- namely, that things match the file glob "*.ext", AND that the result of the "exec" is successful. The -q
tells grep to look for RE in {}
(the file supplied by find
), and exit with a TRUE or FALSE without generating any of its own output.
The only real difference between doing this in find vs doing it with grep is that you get to use find's awesome collection of conditions to narrow down your search further if required. man find
for details. By default, find will recurse into subdirectories.
You can pipe the list to xargs:
grep -l '<pattern>' directory/*.extension1 | xargs rm
As for the second set of files with a different extension, I'd do this (as usual use xargs echo rm
when testing to make a dry run; I haven't tested it, it may not work correctly with filenames with spaces in them):
filelist=$(grep -l '<pattern>' directory/*.extension1)
echo $filelist | xargs rm
echo ${filelist//.extension1/.extension2} | xargs rm
Pipe the result to xargs
, it will allow you to run a command for each match.