I have a .CSV file (file.csv) whose data are all enclosed in double quotes. Sample format of the file is as below:
column1,column2,column3,column4,column5,column6, column7, Column8, Column9, Column10
"12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in between","4432","author1, name","890","88","11-OCT-11","12"
"4432","B000QRIGJ4","890","another, string with quotes, and with more than, two commas: in between","455","author2, name","12","455","12-OCT-11","55"
"11","B000QRIGJ4","77","string with, commas and (paranthesis) and : colans, in between","12","author3, name","333","22","13-OCT-11","232"
The 9th field is the date field in the format "DD-MMM-YY". I have to convert it to the format YYYY/MM/DD. I am trying to use the below code, but of no use.
awk -F, '
BEGIN {
split("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC", month, " ")
for (i=1; i<=12; i++) mdigit[month[i]]=i
}
{ m=substr($9,4,3)
$9 = sprintf("%02d/%02d/"20"%02d",mdigit[m],substr($9,1,2),substr($9,8,20))
print
}' OFS="," file.csv > temp_file.csv
The out put of the file temp_file.csv after executing the above code is as shown below.
column1,column2,column3,column4,column5,column6,column7,Column8,00/00/2000,Column10
"12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in between","4432","author1,00/00/2000,"890","88","11-OCT-11","12"
"4432","B000QRIGJ4","890","another, string with quotes, and with more than, two commas: in between","455",00/00/2002, name","12","455","12-OCT-11","55"
"11","B000QRIGJ4","77","string with, commas and (paranthesis) and : colans, in between","12","author3,00/00/2000,"333","22","13-OCT-11","232"
As far as I am understand, the issue is with the commas in the double quote as my code is taking them into consideration too... Please suggest on the below questions:
1) Does the double quoting all the values in all the fields make any difference? If they make any difference, how do I get rid of them from all the values except the strings with commas in them? 2) Any modifications to my code so I could format the 9th field which in the format "DD-MMM-YYYY" to YYYY/MM/DD