This question already has an answer here:
I have data in a .csv column that sometimes contains commas and newlines. If there is a comma in my data, I have enclosed the entire string with double quotes. How would I go about parsing the output of that column to a .txt file taking the newlines and commas into consideration.
Sample data that doesn't work with my command:
,"This is some text with a , in it.", #data with commas are enclosed in double quotes
,line 1 of data
line 2 of data, #data with a couple of newlines
,"Data that may a have , in it and
also be on a newline as well.",
Here is what I have so far:
awk -F "\"*,\"*" '{print $4}' file.csv > column_output.txt
.
The above uses GNU awk for FPAT and RT. I don't know of any CSV format that would allow you to have a newline in the middle of a field that's not enclosed by quotes (if it did you'd never know where any record ended) so the script doesn't allow for that. The above was run on this input file: