Count number of column in a pipe delimited file

2019-06-25 22:47发布

I have a pipe | delimited file.

File:

106232145|"medicare"|"medicare,medicaid"|789

I would like to count the number of fields in each line. I tried the below code

Code:

awk -F '|' '{print NF-1}'

This returns me the result as 5 instead of 4. This is because the awk takes "medicare|medicaid" as two different fields instead of one field

5条回答
在下西门庆
2楼-- · 2019-06-25 23:11

perl -ne 'print scalar( split( /\|/, $_ ) ) . "\n"' [filename]

查看更多
Deceive 欺骗
3楼-- · 2019-06-25 23:15

For a | delimited file with embedded | in between this GNU awk v4.0 or later should work:

gawk '{ print NF }' FPAT="([^|]+)|(\"[^\"]+\")"
查看更多
闹够了就滚
4楼-- · 2019-06-25 23:19
awk -F\| '{print NF}'

gives correct result.

查看更多
贼婆χ
5楼-- · 2019-06-25 23:25

Pure Unix solution (without awk/Perl):

$ cat  /tmp/x1
1|2|3|34
4534|23442|1121|334434

$ head -1 /tmp/x1 | tr "|" "\012" | wc -l
4

Perl solution - 1-liner:

$ perl5.8 -naF'\|' -e 'print scalar(@F)."\n";exit;' /tmp/x1
4

BUT!!!! IMPORTANT!!!

Every one of these solutions - as well as those on other answers - do NOT work 100%!

Namely, they all break when it's a REAL "pipe-separated" file, with a pipe being a valid character in the field (and the field being quoted), the way real CSV files work.

E.g.

$ cat /tmp/x2
"0|1"|2|3|34
4534|23442|1121|334434
$ perl5.8 -naF'\|' -e 'print scalar(@F)."\n";exit;' /tmp/x1
5   <----- BROKEN!!! There are only 4 fields, first field is "0|1"

To fix that, a proper CSV (or delimited file) parser should be used, such as one in Perl:

$ perl5.8 -MText::CSV_XS 
-ne '$csv=Text::CSV_XS->new({sep_char => "|"});  $csv->parse($_); 
print $csv->fields(); print "\n"; exit;' /tmp/x2

Prints correct value

4

As a note, simply fixing an awk or sed solution with a convoluted RegEx won't work easily, since on top of pipe-containing-and-quoted PSV fields, the spec also allows quotes as part of the field as well. That does NOT lend itself to a nice RegEx solution.

查看更多
Fickle 薄情
6楼-- · 2019-06-25 23:28
$ cat fieldparse.awk
#NR > 1 { print "--"; }

# Uncomment printf/print in the for loops to see
#   each field on a separate line as well as the commented line above (to show that it works).
{
    nfields = 0;
    for (i = 1; i <= NF; i++) {
        if ($i ~ /^".*[^"]$/)
            for (; i <= NF && ($i !~ /.*"$/); i++) {
                #printf("%s%s", $i, FS);
            }
        #print $i;
        nfields++;
    }
    print nfields;
    if (FILENAME == "-")
        FILENAME = "(standard input)";
    filenames[FILENAME] = sprintf("%d %d", FNR, nfields);
}

END {
    print NR, "total records processed";
    for (f in filenames) {
        split(filenames[f], fn, " ");
        printf("\t* %s: %d records with %d fields\n", f, fn[1], fn[2]);
    }
}

$ awk -F'|' -f fieldparse.awk demo.txt

It works for any single character separator that is NOT a double quotation mark, meaning standard tab delimited, CSV, etc. formats (as standard as they get anyway...)

The output format is merely illustrative and a bit decorative at the end, but the content is still useful IMHO, such as handling multiple files. In any case, I hope it helps! :-)

Edit

This was tested using mawk and GNU awk (gawk), the latter of which was tested in traditional, POSIX and the default modes. Trim the comments and output statements to find it actually a small program, though it isn't as small as one might like.

查看更多
登录 后发表回答