Shell script - search and replace text in multiple

2019-02-09 09:01发布

问题:

I have a file "changesDictionary.txt" containing (a variable number of) pairs of key-value strings.

e.g.

"textToSearchFor" = "theReplacementText"

(The format of the dictionary is unimportant, and be changed as required.)

I need to iterate through the contents of a given directory, including sub-directories. For each file encountered with the extension ".txt", we search for each of the keys in changesDictionary.txt, replacing each found instance with the replacement string value.

i.e. a search and replace over multiple files, but using a list of search/replace terms rather than a single search/replace term.

How could I do this? (I have studied single search/replace examples, but do not understand how to do multiple searches within a file.)

The implementation (bash, perl, whatever) is not important as long as I can run it from the command line in Mac OS X. Thanks for any help.

回答1:

I'd convert your changesDictionary.txt file to a sed script, with... sed:

$ sed -e 's/^"\(.*\)" = "\(.*\)"$/s\/\1\/\2\/g/' \
      changesDictionary.txt  > changesDictionary.sed

Note, any special characters for either regular expressions or sed expressions in your dictionary will be falsely interpreted by sed, so your dictionary can either only have only the most primitive search-and-replacements, or you'll need to maintain the sed file with valid expressions. Unfortunately, there's no easy way in sed to either shut off regular expression and use only string matching or quote your searches and replacements as "literals".

With the resulting sed script, use find and xargs -- rather than find -exec -- to convert your files with the sed script as quickly as possible, by processing them more than one at a time.

$ find somedir -type f -print0 \
   | xargs -0 sed -i -f changesDictionary.sed

Note, the -i option of sed edits files "in-place", so be sure to make backups for safety, or use -i~ to create tilde-backups.

Final note, using search and replaces can have unintended consequences. Will you have searches that are substrings of other searches? Here's an example.

$ cat changesDictionary.txt
"fix" = "broken"
"fixThat" = "Fixed"
$ sed -e 's/^"\(.*\)" = "\(.*\)"$/s\/\1\/\2\/g/' changesDictionary.txt  \
   | tee changesDictionary.sed
s/fix/broken/g
s/fixThat/Fixed/g
$ mkdir subdir
$ echo fixThat > subdir/target.txt
$ find subdir -type f -name '*.txt' -print0 \
   | xargs -0 sed -i -f changesDictionary.sed
$ cat subdir/target.txt
brokenThat

Should "fixThat" have become "Fixed" or "brokenThat"? Order matters for sed script. Similarly, a search and replace can be search and replaced more than once -- changing "a" to "b", may be changed by another search-and-replace later from "b" to "c".

Perhaps you've already considered both of these, but I mention because I've tried what you were doing before and didn't think of it. I don't know of anything that simply does the right thing for doing multiple search and replacements at once. So, you need to program it to do the right thing yourself.



回答2:

Here are the basic steps I would do

  1. Copy the changesDictionary.txt file
  2. In it replace "a"="b" to the equivalent sed line: e.g. (use $1 for the file name)

    sed -e 's/a/b/g' $1

    (you could write a script to do this or just do it by hand, if you just need to do this once and it's not too big).

  3. If the files are all in one directory, then you can do something like:

    ls *.txt | xargs scriptFromStep2.sh

  4. If they are in subdirs, use a find to call that script on all of the files, something like

    find . -name '*.txt' -exec scriptFromStep2.sh {} \;

These aren't exact, do some experiments to make sure you get it right -- it's just the approach I would use.

(but, if you can, just use perl, it would be a lot simpler)



回答3:

Use this tool, which is written in Perl - with quite a lot of bells and whistles - oldie, but goodie:

http://unixgods.org/~tilo/replace_string/

Features:

  • do multiple search-replace or query-search-replace operations
  • search-replace expressions can be given on the command line or read from a file
  • processes multiple input files
  • recursively descend into directory and do multiple search/replace operations on all files
  • user defined perl expressions are applied to each line of each input file
  • optionally run in paragraph mode (for multi-line search/replace)
  • interactive mode
  • batch mode
  • optionally backup files and backup numbering
  • preserve modes/owner when run as root
  • ignore symbolic links, empty files, write protected files, sockets, named pipes, and directory names
  • optionally replace lines only matching / not matching a given regular expression

This script has been used quite extensively over the years with large data sets.



回答4:

#!/bin/bash
f="changesDictionary.tx"
find /path -type f -name "*.txt" | while read FILE 
do
    awk 'BEGIN{ FS="=" }
    FNR==NR{ s[$1]=$2;  next }
    {
       for(i in s){      
        if( $0 ~ i ){ gsub(i,s[i]) }
       }
       print $0
    }' $f $FILE  > temp
    mv temp $FILE
done