Count number of column in a pipe delimited file

I have a pipe | delimited file.

File:

106232145|"medicare"|"medicare,medicaid"|789

I would like to count the number of fields in each line. I tried the below code

Code:

awk -F '|' '{print NF-1}'

This returns me the result as 5 instead of 4. This is because the awk takes "medicare|medicaid" as two different fields instead of one field

标签： linux perl shell awk

5条回答

在下西门庆

2楼-- · 2019-06-25 23:11

perl -ne 'print scalar( split( /\|/, $_ ) ) . "\n"' [filename]

0人赞添加讨论(0) 举报

Deceive 欺骗

3楼-- · 2019-06-25 23:15

For a | delimited file with embedded | in between this GNU awk v4.0 or later should work:

gawk '{ print NF }' FPAT="([^|]+)|(\"[^\"]+\")"

0人赞添加讨论(0) 举报

闹够了就滚

4楼-- · 2019-06-25 23:19

awk -F\| '{print NF}'

gives correct result.

0人赞添加讨论(0) 举报

贼婆χ

5楼-- · 2019-06-25 23:25

Pure Unix solution (without awk/Perl):

$ cat  /tmp/x1
1|2|3|34
4534|23442|1121|334434

$ head -1 /tmp/x1 | tr "|" "\012" | wc -l
4

Perl solution - 1-liner:

$ perl5.8 -naF'\|' -e 'print scalar(@F)."\n";exit;' /tmp/x1
4

BUT!!!! IMPORTANT!!!

Every one of these solutions - as well as those on other answers - do NOT work 100%!

Namely, they all break when it's a REAL "pipe-separated" file, with a pipe being a valid character in the field (and the field being quoted), the way real CSV files work.

E.g.

$ cat /tmp/x2
"0|1"|2|3|34
4534|23442|1121|334434
$ perl5.8 -naF'\|' -e 'print scalar(@F)."\n";exit;' /tmp/x1
5   <----- BROKEN!!! There are only 4 fields, first field is "0|1"

To fix that, a proper CSV (or delimited file) parser should be used, such as one in Perl:

$ perl5.8 -MText::CSV_XS 
-ne '$csv=Text::CSV_XS->new({sep_char => "|"});  $csv->parse($_); 
print $csv->fields(); print "\n"; exit;' /tmp/x2

Prints correct value

As a note, simply fixing an awk or sed solution with a convoluted RegEx won't work easily, since on top of pipe-containing-and-quoted PSV fields, the spec also allows quotes as part of the field as well. That does NOT lend itself to a nice RegEx solution.

0人赞添加讨论(0) 举报

Fickle 薄情

6楼-- · 2019-06-25 23:28

$ cat fieldparse.awk
#NR > 1 { print "--"; }

# Uncomment printf/print in the for loops to see
#   each field on a separate line as well as the commented line above (to show that it works).
{
    nfields = 0;
    for (i = 1; i <= NF; i++) {
        if ($i ~ /^".*[^"]$/)
            for (; i <= NF && ($i !~ /.*"$/); i++) {
                #printf("%s%s", $i, FS);
            }
        #print $i;
        nfields++;
    }
    print nfields;
    if (FILENAME == "-")
        FILENAME = "(standard input)";
    filenames[FILENAME] = sprintf("%d %d", FNR, nfields);
}

END {
    print NR, "total records processed";
    for (f in filenames) {
        split(filenames[f], fn, " ");
        printf("\t* %s: %d records with %d fields\n", f, fn[1], fn[2]);
    }
}

$ awk -F'|' -f fieldparse.awk demo.txt

It works for any single character separator that is NOT a double quotation mark, meaning standard tab delimited, CSV, etc. formats (as standard as they get anyway...)

The output format is merely illustrative and a bit decorative at the end, but the content is still useful IMHO, such as handling multiple files. In any case, I hope it helps! :-)

Edit

This was tested using mawk and GNU awk (gawk), the latter of which was tested in traditional, POSIX and the default modes. Trim the comments and output statements to find it actually a small program, though it isn't as small as one might like.

0人赞添加讨论(0) 举报

Count number of column in a pipe delimited file

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间