I have a UNIX script that has nawk block inside it (This is just a part of the UNIX and NAWK script. It has many more logic and the below code should definitely be in nawk)
This block that reads a lookup value for Country ISO code from a file that has country and country code values and I face an issue whenever there is a bracket in the country name ()
or a single apostrope '
Sample values
CIV@COTE D'IVOIRE
COD@CONGO, Democratic Republic of (was Zaire)
Can you pls help me overcome these 2 issues.for a single apostrope can I have it removed from the string or is there any way I can just fine tune the existing code
Code
processbody() {
nawk '{
COUNTRY_NAME = "COTE D'IVOIRE"
if (COUNTRY_NAME != " "){
file = "/tmp/country_codes.txt"
FS = "@"
while( getline < file ) {
if( $0 ~ COUNTRY_NAME ) {
COUNTRY_CODE = $1
}
}
close( file )
}
printf("%s\n",COUNTRY_CODE) > "/tmp/code.txt"
}' /tmp/file.txt
}
You need to understand where the Unix shell is handling quotes and where Awk is handling quotes.
Given your need for both single quotes and double quotes in the script, I think you would be best off using an awk
program file to contain the script, and then using:
awk -f awk.script [file1 ...]
This avoids all the issues of whether the shell is going to understand it or not.
If you can't do that, then you should probably continue using single quotes to surround the awk script, but each occurrence of
'
inside the script must be replaced by:
'\''
The first quote terminates the prevailing single-quoted string. The backslash-quote embeds a single quote into the string. The third quote resumes normal single-quoted string operation, where the only special character is single quote.
If this code appears in a shell script in this form, you need to escape the single quote with a backslash so that it doesn't terminate the nawk code. Something like:
COUNTRY_NAME = "COTE D\'IVOIRE"
In the parenthesis case, you need to escape it in the string so that nawk doesn't see it as a regexp grouping operator:
COUNTRY_NAME = "CONGO, Democratic Republic of \\(was Zaire\\)"
Obviously an issue with quoting. Pass the value to nawk using the -v
option.
Instead of
nawk '{
COUNTRY_NAME = "COTE D'IVOIRE"
if (COUNTRY_NAME != " "){ ...
Use
nawk -v "COUNTRY_NAME=COTE D'IVOIRE" '{
if (COUNTRY_NAME != " "){ ...