I have 2 source files (an english file and an italian file) with the same number of lines and i perform an awk command
to remove all lines from the IT.txt file which have more than 2 words
EN.txt
Santa Claus
Pigs don't fly
The son of the father
Elf
Santa Claus
Elf
Sabatons
Shoes
IT.txt
Babbo Natale
I maiali non volano
Il figlio del padre
Elfo
Babbo Natale
Elfo
Scarpe
Scarpe
So basically i have this kind of output:
EN.txt
Santa Claus
Pigs don't fly
The son of the father
Elf
Santa Claus
Elf
Sabatons
Shoes
IT.txt
Babbo Natale
Elfo
Babbo Natale
Elfo
Scarpe
Scarpe
But at the same time, i'd like to remove the same related strings from the EN.txt file. I thought I could work on the line number (for a moment, then i found out a better solution) and not on running another awk command to remove in the same way the strings having more than 2 words in the EN file, because a translation could be different from the source string (like having more words). So i need to focus my work to the IT file and the EN file must suffer the effect of command i launched. Therefore, my filtered output must be like this:
EN.txt
Santa Claus
Elf
Santa Claus
Elf
Sabatons
Shoes
IT.txt
Babbo Natale
Elfo
Babbo Natale
Elfo
Scarpe
Scarpe
this is the command i tried with (suggested with a previous question) and it works perfectly: awk 'NR==FNR{if(NF>3){a[NR]}else{a[NR]=1;print > "filtered_it.txt"}} NR!=FNR && a[FNR]{print > "filtered_en.txt"}' IT.txt EN.txt
But now i'd like to add extra on this command, like removing duplicates in order to have an output like this, but being careful to those lines that may have the same translation in italian but their respective source strings are different (like Sabatons and Shoes translated into Scarpe). In conclusion, i need to remove the duplicates only from both files at the same time (somehow) and not from a single one running each single command.
EN.txt
Santa Claus
Elf
Sabatons
Shoes
IT.txt
Babbo Natale
Elfo
Scarpe
Scarpe
Your spec is very confusing but I think this is what you wanted. Also, instead of operating on two files, if they are supposed to be matched line by line it's easier to start doing that first.
ps. You either believe time travel is possible or using "tomorrow" instead of "yesterday" :)