Say I have the following csv file:
id,message,time
123,"Sorry, This message
has commas and newlines",2016-03-28T20:26:39
456,"It makes the problem non-trivial",2016-03-28T20:26:41
I want to write a bash command that will return only the time column. i.e.
time
2016-03-28T20:26:39
2016-03-28T20:26:41
What is the most straight forward way to do this? You can assume the availability of standard unix utils such as awk, gawk, cut, grep, etc.
Note the presence of "" which escape , and newline characters which make trivial attempts with
cut -d , -f 3 file.csv
futile.
CSV is a format which needs a proper parser (i.e. which can't be parsed with regular expressions alone). If you have Python installed, use the
csv
module instead of plain BASH.If not, consider csvkit which has a lot of powerful tools to process CSV files from the command line.
See also:
As said here
To handle specifically those newlines that are in doubly-quoted strings and leave those alone that are outside them, using
GNU awk
(forRT
):This works by splitting the file along
"
characters and removing newlines in every other block.Output
Then use awk to split the columns and display the last column
As chepner said, you are encouraged to use a programming language which is able to parse csv.
Here comes an example in python:
another
awk
alternative using FSI ran into something similar when attempting to deal with lspci -m output, but the embedded newlines would need to be escaped first (though IFS=, should work here, since it abuses bash' quote evaluation). Here's an example
And the only reasonable way I can find to bring that into bash is along the lines of:
Not a full answer, but might help!