Bash: Parse CSV with quotes, commas and newlines

Say I have the following csv file:

 id,message,time
 123,"Sorry, This message
 has commas and newlines",2016-03-28T20:26:39
 456,"It makes the problem non-trivial",2016-03-28T20:26:41

I want to write a bash command that will return only the time column. i.e.

time
2016-03-28T20:26:39
2016-03-28T20:26:41

What is the most straight forward way to do this? You can assume the availability of standard unix utils such as awk, gawk, cut, grep, etc.

Note the presence of "" which escape , and newline characters which make trivial attempts with

cut -d , -f 3 file.csv

futile.

标签： bash csv awk cut gawk

7条回答

甜甜的少女心

2楼-- · 2020-04-02 07:56

CSV is a format which needs a proper parser (i.e. which can't be parsed with regular expressions alone). If you have Python installed, use the csv module instead of plain BASH.

If not, consider csvkit which has a lot of powerful tools to process CSV files from the command line.

See also:

https://unix.stackexchange.com/questions/7425/is-there-a-robust-command-line-tool-for-processing-csv-files

0人赞添加讨论(0) 举报

虎瘦雄心在

3楼-- · 2020-04-02 07:59

As said here

gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' file.csv \
 | awk -F, '{print $NF}'

To handle specifically those newlines that are in doubly-quoted strings and leave those alone that are outside them, using GNU awk (for RT):

gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' file

This works by splitting the file along " characters and removing newlines in every other block.

Output

time
2016-03-28T20:26:39
2016-03-28T20:26:41

Then use awk to split the columns and display the last column

0人赞添加讨论(0) 举报

Deceive 欺骗

4楼-- · 2020-04-02 08:00

awk -F, '!/This/{print $NF}' file

time
2016-03-28T20:26:39
2016-03-28T20:26:41

0人赞添加讨论(0) 举报

爷、活的狠高调

5楼-- · 2020-04-02 08:05

As chepner said, you are encouraged to use a programming language which is able to parse csv.

Here comes an example in python:

import csv

with open('a.csv', 'rb') as csvfile:
    reader = csv.reader(csvfile, quotechar='"')
    for row in reader:
        print(row[-1]) # row[-1] gives the last column

0人赞添加讨论(0) 举报

爷、活的狠高调

6楼-- · 2020-04-02 08:11

another awk alternative using FS

$ awk -F'"' '!(NF%2){getline remainder;$0=$0 OFS remainder}
                NR>1{sub(/,/,"",$NF); print $NF}' file

2016-03-28T20:26:39
2016-03-28T20:26:41

0人赞添加讨论(0) 举报

干净又极端

7楼-- · 2020-04-02 08:15

I ran into something similar when attempting to deal with lspci -m output, but the embedded newlines would need to be escaped first (though IFS=, should work here, since it abuses bash' quote evaluation). Here's an example

f:13.3 "System peripheral" "Intel Corporation" "Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder" -r01 "Super Micro Computer Inc" "Device 0838"

And the only reasonable way I can find to bring that into bash is along the lines of:

# echo 'f:13.3 "System peripheral" "Intel Corporation" "Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder" -r01 "Super Micro Computer Inc" "Device 0838"' | { eval array=($(cat)); declare -p array; }
declare -a array='([0]="f:13.3" [1]="System peripheral" [2]="Intel Corporation" [3]="Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder" [4]="-r01" [5]="Super Micro Computer Inc" [6]="Device 0838")'
#

Not a full answer, but might help!

0人赞添加讨论(0) 举报

1 2 下一页

Bash: Parse CSV with quotes, commas and newlines

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间