I have a list of domain names in a text file with a number of times they occur in a collection of email files. For example:
598 aol.com
1 aOL.COM
4 Aol.com
1 AOl.com
6 AOL.com
39 AOL.COM
There were 598 emails sent to aol.com and 1 sent to aOL.COM and so on. I was wondering if there was a way in bash to combine aol.com and aOL.COM and all the other aliases since they are in fact the same thing. Any help would be greatly appreciated!
This is the line of code that produced that output:
grep -E -o -r "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b" $ARCHIVE | sed 's/.*@//' | sort | uniq -c > temp2
Add a
-i
(--ignore-case
) flag to theuniq
command in your one-liner:From the
uniq
man page:I would recommend changing the program producing this code to first make everything lowercase, (Converting string to lower case in Bash shell scripting), then try sorting.
Doing this after the fact would just make your life harder.