I have a large text file containing a list of emails called "main", and I have sent mails to some of them. I have a list of 'sent' emails. Now, I want to remove the 'sent' emails from the list "main".
In other words, I want to remove both the matching raw from the text file while removing duplicates. Example:
I have:
email@email.com
test@test.com
email@email.com
I want:
test@test.com
Is there any easier way to achieve this? Please suggest a tool or method to do this, but please consider the text file is larger than 10MB.
In terminal:
cat test| sort | uniq -c | awk -F" " '{if($1==1) print $2}'
I use cygwin a lot for such tasks, as the unix command line is incredibly powerful.
Here's how to achieve what you want:
cat main.txt | sort -u | grep -Fvxf sent.txt
sort -u
will remove duplicates (by sorting the main.txt
file first), and grep
will take care of removing the unwanted addresses.
Here's what the grep
options mean:
-F
plain text search
-v
invert results
-x
will force the whole line to match the pattern
-f
read patterns from the specified file
Oh, and if your files are in the Windows format (CR LF
newlines) you'll rather have to do this:
cat main.txt | dos2unix | sort -u | grep -Fvxf <(cat sent.txt | dos2unix)
Just like with the Windows command line, you can simply add:
> output.txt
at the end of the command line to redirect the output to a text file.