How to remove duplicates entries from a file using

2019-08-15 10:15发布

问题:

I have a file that is in the format:

0000000540|Q1.1|margi|Q1.1|margi|Q1.1|margi
0099940598|Q1.2|8888|Q1.3|5454|Q1.2|8888    
0000234223|Q2.10|saigon|Q3.9|tango|Q1.1|money

I am trying to remove the duplicates that appear on the same line.

So, if a line has

0000000540|Q1.1|margi|Q1.1|margi|Q1.1|margi

I'll like it to be

0000000540|Q1.1|margi

If the line has

0099940598|Q1.2|8888|Q1.3|5454|Q1.2|8888

I'll like it to be like

0099940598|Q1.2|8888|Q1.3|5454

I would like to do this on a shell script that takes an input file and outputs the file without the duplicates.

Thanks in advance to anyone who can help

回答1:

This should do it but may not be efficient for large files.

awk '
    {
        delete p;
        n = split($0, a, "|");

        printf("%s", a[1]);

        for (i = 2; i <= n ; i++)
        {
                if (!(a[i] in p))
                {
                    printf("|%s", a[i]);
                    p[a[i]] = "";
                }
        }

        printf "\n";
    }
' YourFileName


标签: shell