I have data as per below
123,"john,test",John"test,""john"",345
The above needs to be split as per below,
123
"john,test"
John"test
""john""
345
I tried using sed to handle comma within quotes while splitting but for the data which is within multiple double quotes is not displayed correctly. And also the data having double quotes in between also is not getting handled. I tried using awk but couldn't make use of fpat feature as we have older version of awk.
Can you help with the solution for the same?
This might work for you (GNU sed):
sed -r 's/([^",]*("[^"]*"[^",]*)*),/\1\n/g' file
Replace all commas not surrounded by double quotes with newlines.
In more depth: Group zero or more characters that do not contain double quotes or commas, followed by zero or more groups of double quotes followed by zero or more non double quote characters (may be commas), followed by a double quote followed by zero or more characters that do not contain double quotes, followed by a comma, and replace the final comma by a newline. Do this globally throughout the file.
Now if the double quotes or commas are quoted ...
sed is for simple subsitutions on individual lines, that is all. For anything else such as your problem you should be using awk. See What's the most robust way to efficiently parse CSV using awk? for how to robustly handle CSV with any awk, even if the fields contain newlines but wrt your specific problem - you said you can't use FPAT but FPAT is just a GNU shorthand for a while(match()) loop in any awk so if you can't do this:
$ awk -v FPAT='[^,]+|"[^"]+"' '{for (i=1; i<=NF;i++) print i, "<" $i ">"}' file
1 <123>
2 <"john,test">
3 <John"test>
4 <""john"">
5 <345>
then do this instead:
$ awk '{ while ( match($0,/[^,]+|"[^"]+"/) ) { print ++i, "<" substr($0,RSTART,RLENGTH) ">"; $0=substr($0,RSTART+RLENGTH) } }' file
1 <123>
2 <"john,test">
3 <John"test>
4 <""john"">
5 <345>