I have a pipe |
delimited file.
File:
106232145|"medicare"|"medicare,medicaid"|789
I would like to count the number of fields in each line. I tried the below code
Code:
awk -F '|' '{print NF-1}'
This returns me the result as 5 instead of 4. This is because the awk takes "medicare|medicaid" as two different fields instead of one field
perl -ne 'print scalar( split( /\|/, $_ ) ) . "\n"'
[filename]For a
|
delimited file with embedded|
in between thisGNU awk v4.0
or later should work:gives correct result.
Pure Unix solution (without awk/Perl):
Perl solution - 1-liner:
BUT!!!! IMPORTANT!!!
Every one of these solutions - as well as those on other answers - do NOT work 100%!
Namely, they all break when it's a REAL "pipe-separated" file, with a pipe being a valid character in the field (and the field being quoted), the way real CSV files work.
E.g.
To fix that, a proper CSV (or delimited file) parser should be used, such as one in Perl:
Prints correct value
As a note, simply fixing an
awk
orsed
solution with a convoluted RegEx won't work easily, since on top of pipe-containing-and-quoted PSV fields, the spec also allows quotes as part of the field as well. That does NOT lend itself to a nice RegEx solution.It works for any single character separator that is NOT a double quotation mark, meaning standard tab delimited, CSV, etc. formats (as standard as they get anyway...)
The output format is merely illustrative and a bit decorative at the end, but the content is still useful IMHO, such as handling multiple files. In any case, I hope it helps! :-)
Edit
This was tested using mawk and GNU awk (gawk), the latter of which was tested in traditional, POSIX and the default modes. Trim the comments and output statements to find it actually a small program, though it isn't as small as one might like.