可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have data as per below

123,"john,test",John"test,""john"",345

The above needs to be split as per below,

123

"john,test"

John"test

""john""

345

I tried using sed to handle comma within quotes while splitting but for the data which is within multiple double quotes is not displayed correctly. And also the data having double quotes in between also is not getting handled. I tried using awk but couldn't make use of fpat feature as we have older version of awk.

Can you help with the solution for the same?

回答1:

This might work for you (GNU sed):

sed -r 's/([^",]*("[^"]*"[^",]*)*),/\1\n/g' file

Replace all commas not surrounded by double quotes with newlines.

In more depth: Group zero or more characters that do not contain double quotes or commas, followed by zero or more groups of double quotes followed by zero or more non double quote characters (may be commas), followed by a double quote followed by zero or more characters that do not contain double quotes, followed by a comma, and replace the final comma by a newline. Do this globally throughout the file.

Now if the double quotes or commas are quoted ...

回答2:

sed is for simple subsitutions on individual lines, that is all. For anything else such as your problem you should be using awk. See What's the most robust way to efficiently parse CSV using awk? for how to robustly handle CSV with any awk, even if the fields contain newlines but wrt your specific problem - you said you can't use FPAT but FPAT is just a GNU shorthand for a while(match()) loop in any awk so if you can't do this:

$ awk -v FPAT='[^,]+|"[^"]+"' '{for (i=1; i<=NF;i++) print i, "<" $i ">"}' file
1 <123>
2 <"john,test">
3 <John"test>
4 <""john"">
5 <345>

then do this instead:

$ awk '{ while ( match($0,/[^,]+|"[^"]+"/) ) { print ++i, "<" substr($0,RSTART,RLENGTH) ">"; $0=substr($0,RSTART+RLENGTH) } }' file
1 <123>
2 <"john,test">
3 <John"test>
4 <""john"">
5 <345>