I have a plain text file with words, which are separated by comma, for example:
word1, word2, word3, word2, word4, word5, word 3, word6, word7, word3
i want to delete the duplicates and to become:
word1, word2, word3, word4, word5, word6, word7
Any Ideas? I think, egrep can help me, but i'm not sure, how to use it exactly....
open file with vim (
vim filename
) and run sort command with unique flag (:sort u
).i had the very same problem today.. a word list with 238,000 words but about 40, 000 of those were duplicates. I already had them in individual lines by doing
to remove the duplicates I simply did
Worked perfectly no errors and now my file is down from 1.45MB to 1.01MB
I'd think you'll want to replace the spaces with newlines, use the uniq command to find unique lines, then replace the newlines with spaces again.
I presumed you wanted the words to be unique on a single line, rather than throughout the file. If this is the case, then the Perl script below will do the trick.
If you want uniqueness over the whole file, you can just move the
%seen
hash outside thewhile (){}
loop.