I have a folder reviews_folder
that contains lots of files, such as hotel_217616.dat
. I have written a script countreviews.sh
to check the number of times the word "Author" appears in each file and then print the number out for each respective file. Here is my script:
grep -r "<Author>" "#1"
I cannot write reviews_folder
in the shell code, it must take it as an argument in the command line, hence #1
. The number of time my word appears in each file must then be ranked from highest to lowest, for example
-- run script --
49
23
17
However, when I run my script it says "#1: No such file or directory"
; why isn't it replacing #1
with reviews_folder
when I type:
./countreviews.sh reviews_folder
My countreviews.sh
is sitting in the same directory as my reviews_folder
, which contains the files I will be checking if that matters.
First off, the positional parameter is $1
and not #1
.
Secondly, your script doesn't really "count the number of time the word Author
appears"; it looks literally for <Author>
, including the angle brackets.
I assume you wanted word boundaries, as in \<Author\>
.
grep -r
just lists all matching lines, prepended by filenames. You want only the count, and sorted. To do this, you can do
grep -rwch 'Author'
-w
searches for word matches
-c
returns a match count per file
-h
suppresses writing the file name
And to sort the output, you pipe it to sort
:
grep -rwch 'Author' | sort -nr
-n
is for "numerical sort", and -r
for "reverse", so the largest number is first.
Notice how this still only counts how many lines matched "Author"; if there is a line with five matches, it is counted only as one by grep -c
.
To properly count every single occurrence, you could to this:
find . -type f -exec bash -c 'grep -wo "Author" {} | wc -l' \; | sort -nr
find . -type f
finds recursively all files.
-exec
executes a command for each file found. Because we use a pipe in that command, we have to spawn a subshell with bash -c
.
grep -wo "Author" {} | wc -l
finds every match of Author
and prints it on a separate line; wc -l
then counts the lines.
- After this happened for all files,
sort -nr
again sorts the results.
ITYM $1
, not #1
..........................................