Optimize shell script for multiple sed replacement

2020-02-12 08:24发布

I have a file containing a list of replacement pairs (about 100 of them) which are used by sed to replace strings in files.

The pairs go like:

old|new
tobereplaced|replacement
(stuffiwant).*(too)|\1\2

and my current code is:

cat replacement_list | while read i
do
    old=$(echo "$i" | awk -F'|' '{print $1}')    #due to the need for extended regex
    new=$(echo "$i" | awk -F'|' '{print $2}')
    sed -r "s/`echo "$old"`/`echo "$new"`/g" -i file
done

I cannot help but think that there is a more optimal way of performing the replacements. I tried turning the loop around to run through lines of the file first but that turned out to be much more expensive.

Are there any other ways of speeding up this script?

EDIT

Thanks for all the quick responses. Let me try out the various suggestions before choosing an answer.

One thing to clear up: I also need subexpressions/groups functionality. For example, one replacement I might need is:

([0-9])U|\10  #the extra brackets and escapes were required for my original code

Some details on the improvements (to be updated):

  • Method: processing time
  • Original script: 0.85s
  • cut instead of awk: 0.71s
  • anubhava's method: 0.18s
  • chthonicdaemon's method: 0.01s

标签: bash shell sed
7条回答
Fickle 薄情
2楼-- · 2020-02-12 08:55

You can try this.

pattern=''
cat replacement_list | while read i
do
    old=$(echo "$i" | awk -F'|' '{print $1}')    #due to the need for extended regex
    new=$(echo "$i" | awk -F'|' '{print $2}')
    pattern=${pattern}"s/${old}/${new}/g;"
done
sed -r ${pattern} -i file

This will run the sed command only once on the file with all the replacements. You may also want to replace awk with cut. cut may be more optimized then awk, though I am not sure about that.

old=`echo $i | cut -d"|" -f1`
new=`echo $i | cut -d"|" -f2`
查看更多
登录 后发表回答