Removing blank spaces in specific column in pipe d

2019-07-25 14:36发布

问题:

Good morning. Long time reader, first time emailer so please be gentle.

I'm working on AIX 5.3 and have a 42 column pipe delimited file. There are telephone numbers in columns 15 & 16 (land|mobile) which may or may not contain spaces depending on who has keyed in the data.

I need to remove these space from columns 15 & 16 only ie

Column 15   |   Column 16 **Currently**
01942 665432|07865346122
01942756423 |07855 333567
Column 15   |   Column 16 **Needs to be**
01942665432|07865346122
01942756423|07855333567

I have a quick & dirty script which unfortunately is proving to be anything but quick because it's a while loop reading every single line, cutting the field on the pipe delimiter, assigning it to a variable, using sed on column 15 & 16 only to strip blank spaces then writing it out to a new file ie

cat $file | while read 

output

do

.....

fourteen=$( echo $output | cut -d'|' -f14 )

fifteen=$( echo $output | cut -d'|' -f15 | sed 's/ //g' )

echo ".....$fourteen|$fifteen..." > $new_file

done

I know there must be a better way to do this, probably using AWK, but am open to any kind of suggestion anyone can offer as the script as it stands is taking half an hour plus to process 176,000 records.

Thanks in advance.

回答1:

Yes, awk is better suited here

$ cat ip.txt 
a|foo bar|01942 665432|07865346122|123
b|i j k |01942756423 |07855 333567|90870

$ awk 'BEGIN{FS=OFS="|"} {gsub(" ","",$3); gsub(" ","",$4)} 1' ip.txt 
a|foo bar|01942665432|07865346122|123
b|i j k |01942756423|07855333567|90870
  • BEGIN{FS=OFS="|"} set | as input and output field separator
  • gsub(" ","",$3) replace all spaces with nothing only for column 3
  • gsub(" ","",$4) replace all spaces with nothing only for column 4
  • 1 idiomatic way to print the input record (including any modification done )

Change 3 and 4 to whatever field you need


In case first line should not be affected, add a condition

awk 'BEGIN{FS=OFS="|"} NR>1{gsub(" ","",$3); gsub(" ","",$4)} 1' ip.txt 


标签: unix awk sed aix