awk ignore delimiter inside single quote within a

2019-08-20 07:45发布

问题:

I have a set of data inside the csv as below:

 Given Data:
 (12,'hello','this girl,is lovely(adorable \r\n actually)',goodbye),
 (13,'hello','this fruit,is super tasty (sweet actually)',goodbye)

I want to print the given data into 2 rows starting from ( till ) and ignore delimiter , and () inside the ' ' field.

How can I do this using awk or sed in linux?

Expected result as below:

 Expected Result: 
 row 1 = 12,'hello','this girl,is lovely(adorable actually)',goodbye
 row 2 = 13,'hello','this fruit,is super tasty (sweet actually)',goodbye

UPDATE: I just noticed that there are a comma between the 2 rows. So how can i separate it into 2 rows using the , after ) and before (?

回答1:

You can use the following awk command to achieve your goal:

awk -i.bak '{str=substr($0,2,length($0)-2); gsub("\\\\r ?|\\\\n ?","",str); print "row "NR" = "str;}' file.in

tested on your input:

explanations:

  • -i.bak will take a backup of your file and
  • {str=substr($0,2,length($0)-2); gsub("\\\\r ?|\\\\n ?","",str); print "row "NR" = "str;} will first remove the first and last parenthesis of your string before removing the \r,\n and printing it in the format you want
  • you might need to add before the {...} the following condition if you have a header NR>1 -> 'NR>1{str=substr($0,2,length($0)-2); gsub("\\\\r ?|\\\\n ?","",str); print "row "NR" = "str;}'

following the changes in your requirements, I have adapted the awk command to be able to take into account your , as a record separator (row separator)

awk -i.bak 'BEGIN{RS=",\n|\n"}{str=substr($0,2,length($0)-2); gsub("\\\\r ?|\\\\n ?","",str); print "row "NR" = "str;}' file.in

where BEGIN{RS=",\n|\n"} defines your row separator constraint