Given a .txt files with space separated words such as:
But where is Esope the holly Bastard
But where is
And the Awk function :
cat /pathway/to/your/file.txt | tr ' ' '\n' | sort | uniq -c | awk '{print $2"@"$1}'
I get the following output in my console :
1 Bastard
1 Esope
1 holly
1 the
2 But
2 is
2 where
How to get into printed into myFile.txt ? I actually have 300.000 lines and near 2 millions words. Better to output the result into a file.
EDIT: Used answer (by @Sudo_O):
$ awk '{a[$1]++}END{for(k in a)print a[k],k}' RS=" |\n" myfile.txt | sort > myfileout.txt
Your pipeline isn't very efficient you should do the whole thing in
awk
instead:If you want the output in sorted order:
The actual output given by your pipeline is:
Note: using
cat
is useless here we can just redirect the input with<
. Theawk
script doesn't make sense either, it's just reversing the order of the words and words frequency and separating them with an@
. If we drop theawk
script the output is closer to the desired output (notice the preceding spacing however and it's unsorted):We could
sort
again a remove the leading spaces withsed
:But like I mention at the start let
awk
handle it:Just use shell redirection :
Tips
A useful command is
tee
which allow to redirect to a file and still see the output :Sorting and locale
I see you are working with asian script, you need to be need to be careful with the locale use by your system, as the resulting sort might not be what you expect :
And have a look at the output of :
Just redirect output to a file.