I have a CSV file, but unlike in related questions, it has some columns containing double-quoted strings with commas, e.g.
foo,bar,baz,quux
11,"first line, second column",13.0,6
210,"second column of second line",23.1,5
(of course it's longer, and the number of quoted commas is not necessarily one or 0, nor is the text predictable.) The text might also have (escaped) double-quotes within double-quotes, or not have double-quotes altogether for a typically-quoted field. The only assumption we can make is that there are no quoted newlines, so we can split lines trivially using \n
.
Now, I'd like to extract a specific column (say, the third one) - say, to be printed on standard output, one value per line. I can't simply use commas as field delimiters (and thus, e.g., use cut
); rather, I need to something more sophisticated. What could that be?
Note: I'm using bash on a Linux system.