Input:
a;3;c;1
a;4;b;2
a;5;c;1
Output:
a;3;c;1
a;5;c;1
Hence, all lines which have duplicates of columns 1,3 and 4 should be printed.
Input:
a;3;c;1
a;4;b;2
a;5;c;1
Output:
a;3;c;1
a;5;c;1
Hence, all lines which have duplicates of columns 1,3 and 4 should be printed.
If a 2-pass approach is OK:
$ awk -F';' '{key=$1 FS $3 FS $4} NR==FNR{cnt[key]++;next} cnt[key]>1' file file
a;3;c;1
a;5;c;1
otherwise:
$ awk -F';' '
{ key=$1 FS $3 FS $4; a[key,++cnt[key]]=$0 }
END {
for (key in cnt)
if (cnt[key] > 1)
for (i=1; i<=cnt[key]; i++)
print a[key,i]
}
' file
a;3;c;1
a;5;c;1
The output order of keys in that second script will be random due to the in
operator - easily fixed if that's an issue.
give this one-liner a try:
awk -F';' '{k=$1 FS $3 FS $4}
NR==FNR{if(a[k]){p[a[k]];p[NR]}a[k]=NR;next}FNR in p' file file
It goes through the file twice, first time, it marked the line numbers should be printed, second time print those lines.
Here is my solution:
awk 'BEGIN{ FS=";" }NR==1{ split($0, a, ";"); print }NR>1{ if ( a[1] == $1 && a[3] == $3 && a[4] == $4){ print }}'
Output:
a;3;c;1
a;5;c;1
Works of course only if the line with specific column is the first one.