Lets say I have a csv file like this:
a,b1,12,
a,b1,42,
d,e1,12,
r,12,33,
I want to use grep to return only only the rows where the third column = 12. So it would return:
a,b1,12,
d,e1,12,
but not:
r,12,33,
Any ideas for a regular expression that will allow me to do this?
I'd jump straight to awk to test the value exactly
This, and any regexp-based solution, assumes that the values of the first two fields do not contain commas
Linux tools cannot practically process csv, because quoted fields can contain newline characters according to rfc 1480 Most dedicated utilities are garbage for various reasons.
Here’s a Node.js v7.10+ single-file executable that “just works” and produces converted json objects, one per line. Should run Linux macOS Windows
Usage for a file with header line:
Without header line:
The grep becomes:
On the irect text you can do
Paste this as csv1480json accessible via your PATH and give executable permissions:
csvkit is a great toolkit for stuff like this, especially on the larger scale. After installing csvkit, follow these instructions to isolate the rows you want:
This should prettily print out the rows you want. The full documentation for csvkit (and a well-writen tutorial) can be found here.
when you have csv files, where you have distinct delimiters such as commas, use the splitting on field/delimiters approach, not regular expression. Tools to break strings up like awk, Perl/Python does the job easily for you (Perl/Python has support for csv modules for more complex csv parsing)
Perl,
or with just the shell
Here's a variation:
The advantage is that you can select the field simply by changing the number enclosed in curly braces without having to add or subtract literal copies of the pattern manually.