Using awk how do I print all lines containing dupl

2019-09-14 12:49发布

问题:

Input:

a;3;c;1
a;4;b;2
a;5;c;1

Output:

a;3;c;1
a;5;c;1

Hence, all lines which have duplicates of columns 1,3 and 4 should be printed.

回答1:

If a 2-pass approach is OK:

$ awk -F';' '{key=$1 FS $3 FS $4} NR==FNR{cnt[key]++;next} cnt[key]>1' file file
a;3;c;1
a;5;c;1

otherwise:

$ awk -F';' '
    { key=$1 FS $3 FS $4; a[key,++cnt[key]]=$0 }
    END {
        for (key in cnt)
            if (cnt[key] > 1)
                for (i=1; i<=cnt[key]; i++)
                    print a[key,i]
    }
' file
a;3;c;1
a;5;c;1

The output order of keys in that second script will be random due to the in operator - easily fixed if that's an issue.



回答2:

give this one-liner a try:

awk -F';' '{k=$1 FS $3 FS $4}
    NR==FNR{if(a[k]){p[a[k]];p[NR]}a[k]=NR;next}FNR in p' file file

It goes through the file twice, first time, it marked the line numbers should be printed, second time print those lines.



回答3:

Here is my solution:

awk 'BEGIN{ FS=";" }NR==1{ split($0, a, ";"); print }NR>1{ if ( a[1] == $1 && a[3] == $3 && a[4] == $4){ print }}'

Output:

a;3;c;1
a;5;c;1

Works of course only if the line with specific column is the first one.