I have a folder reviews_folder
that contains lots of files, such as hotel_217616.dat
. I have written a script countreviews.sh
to check the number of times the word "Author" appears in each file and then print the number out for each respective file. Here is my script:
grep -r "<Author>" "#1"
I cannot write reviews_folder
in the shell code, it must take it as an argument in the command line, hence #1
. The number of time my word appears in each file must then be ranked from highest to lowest, for example
-- run script --
49
23
17
However, when I run my script it says "#1: No such file or directory"
; why isn't it replacing #1
with reviews_folder
when I type:
./countreviews.sh reviews_folder
My countreviews.sh
is sitting in the same directory as my reviews_folder
, which contains the files I will be checking if that matters.
First off, the positional parameter is
$1
and not#1
.Secondly, your script doesn't really "count the number of time the word
Author
appears"; it looks literally for<Author>
, including the angle brackets.I assume you wanted word boundaries, as in
\<Author\>
.grep -r
just lists all matching lines, prepended by filenames. You want only the count, and sorted. To do this, you can do-w
searches for word matches-c
returns a match count per file-h
suppresses writing the file nameAnd to sort the output, you pipe it to
sort
:-n
is for "numerical sort", and-r
for "reverse", so the largest number is first.Notice how this still only counts how many lines matched "Author"; if there is a line with five matches, it is counted only as one by
grep -c
.To properly count every single occurrence, you could to this:
find . -type f
finds recursively all files.-exec
executes a command for each file found. Because we use a pipe in that command, we have to spawn a subshell withbash -c
.grep -wo "Author" {} | wc -l
finds every match ofAuthor
and prints it on a separate line;wc -l
then counts the lines.sort -nr
again sorts the results.ITYM
$1
, not#1
..........................................