“sed” special characters handling

2020-02-01 06:27发布

问题:

we have an sed command in our script to replace the file content with values from variables

for example..

export value="dba01upc\Fusion_test"
sed -i "s%{"sara_ftp_username"}%$value%g" /home_ldap/user1/placeholder/Sara.xml

the sed command ignores the special characters like '\' and replacing with string "dba01upcFusion_test" without '\' It works If I do the export like export value='dba01upc\Fusion_test' (with '\' surrounded with ‘’).. but unfortunately our client want to export the original text dba01upc\Fusion_test with single/double quotes and he don’t want to add any extra characters to the text. Can any one let me know how to make sed to place the text with special characters..

Before Replacement : Sara.xml

<?xml version="1.0" encoding="UTF-8"?>
<ser:service-account >
<ser:description/>
<ser:static-account>
<con:username>{sara_ftp_username}</con:username>
</ser:static-account>
</ser:service-account>

After Replacement : Sara.xml

<?xml version="1.0" encoding="UTF-8"?>
<ser:service-account>
<ser:description/>
<ser:static-account>
<con:username>dba01upcFusion_test</con:username>
</ser:static-account>
</ser:service-account>

Thanks in advance

回答1:

You cannot robustly solve this problem with sed. Just use awk instead:

awk -v old="string1" -v new="string2" '
idx = index($0,old) {
    $0 = substr($0,1,idx-1) new substr($0,idx+length(old))
}
1' file

Ah, @mklement0 has a good point - to stop escapes from being interpreted you need to pass in the values in the arg list along with the file names and then assign the variables from that, rather than assigning values to the variables with -v (see the summary I wrote a LONG time ago for the comp.unix.shell FAQ at http://cfajohnson.com/shell/cus-faq-2.html#Q24 but apparently had forgotten!).

The following will robustly make the desired substitution (a\ta -> e\tf) on every search string found on every line:

$ cat tst.awk
BEGIN {
    old=ARGV[1]; delete ARGV[1]
    new=ARGV[2]; delete ARGV[2]
    lgthOld = length(old)
}
{
    head = ""; tail = $0
    while ( idx = index(tail,old) ) {
        head = head substr(tail,1,idx-1) new
        tail = substr(tail,idx+lgthOld)
    }
    print head tail
}

$ cat file
a\ta    a       a       a\ta

$ awk -f tst.awk 'a\ta' 'e\tf' file
e\tf    a       a       e\tf

The white space in file is tabs. You can shift ARGV[3] down and adjust ARGC if you like but it's not necessary in most cases.



回答2:

Update with the benefit of hindsight, to present options:

  • Update 2: If you're intent on using sed, see the - somewhat cumbersome, but now robust and generic - solution below.
  • If you want a robust, self-contained awk solution that also properly handles both arbitrary search and replacement strings (but cannot incorporate regex features such as word-boundary assertions), see Ed Morton's answer.
  • If you want a pure bash solution and your input files are small and preserving multiple trailing newlines is not important, see Charles Duffy's answer.
  • If you want a full-fledged third-party templating solution, consider, for instance, j2cli, a templating CLI for Jinja2 - if you have Python and pip, install with sudo pip install j2cli.
    Simple example (note that since the replacement string is provided via a file, this may not be appropriate for sensitive data; note the double braces ({{...}})):

    value='dba01upc\Fusion_test'
    echo "sara_ftp_username=$value" >data.env
    echo '<con:username>{{sara_ftp_username}}</con:username>' >tmpl.xml
    j2 tmpl.xml data.env # -> <con:username>dba01upc\Fusion_test</con:username>
    

If you use sed, careful escaping of both the search and the replacement string is required, because:

  • As Ed Morton points out in a comment elsewhere, sed doesn't support use of literal strings as replacement strings - it invariably interprets special characters/sequences in the replacement string.
  • Similarly, the search string literal must be escaped in a way that its characters aren't mistaken for special regular-expression characters.

The following uses two generic helper functions that perform this escaping (quoting) that apply techniques explained at "Is it possible to escape regex characters reliably with sed?":

#!/usr/bin/env bash

# SYNOPSIS
#   quoteRe <text>
# DESCRIPTION
#   Quotes (escapes) the specified literal text for use in a regular expression,
#   whether basic or extended - should work with all common flavors.
quoteRe() { sed -e 's/[^^]/[&]/g; s/\^/\\^/g; $!a\'$'\n''\\n' <<<"$1" | tr -d '\n'; }

# '

# SYNOPSIS
#  quoteSubst <text>
# DESCRIPTION
#  Quotes (escapes) the specified literal string for safe use as the substitution string (the 'new' in `s/old/new/`).
quoteSubst() {
  IFS= read -d '' -r < <(sed -e ':a' -e '$!{N;ba' -e '}' -e 's/[&/\]/\\&/g; s/\n/\\&/g' <<<"$1")
  printf %s "${REPLY%$'\n'}"    
}

# The search string.
search='{sara_ftp_username}'

# The replacement string; a demo value with characters that need escaping.
value='&\1%"'\'';<>/|dba01upc\Fusion_test'

# Use the appropriately escaped versions of both strings.
sed "s/$(quoteRe "$search")/$(quoteSubst "$value")/g" <<<'<el>{sara_ftp_username}</el>'

# -> <el>&\1%"';<>/|dba01upc\Fusion_test</el>
  • Both quoteRe() and quoteSubst() correctly handle multi-line strings.
    • Note, however, given that sed reads a single line at at time by default, use of quoteRe() with multi-line strings only makes sense in sed commands that explicitly read multiple (or all) lines at once.
  • quoteRe() is always safe to use with a command substitution ($(...)), because it always returns a single-line string (newlines in the input are encoded as '\n').
  • By contrast, if you use quoteSubst() with a string that has trailing newlines, you mustn't use $(...), because the latter will remove the last trailing newline and therefore break the encoding (since quoteSubst() \-escapes actual newlines, the string returned would end in a dangling \).
    • Thus, for strings with trailing newlines, use IFS= read -d '' -r escapedValue < <(quoteSubst "$value") to read the escaped value into a separate variable first, then use that variable in the sed command.


回答3:

This can be done with bash builtins alone -- no sed, no awk, etc.

orig='{sara_ftp_username}'               # put the original value into a variable
new='dba01upc\Fusion_test'               # ...no need to 'export'!

contents=$(<Sara.xml)                    # read the file's content into
new_contents=${contents//"$orig"/$new}   # use parameter expansion to replace
printf '%s' "$new_contents" >Sara.xml    # write new content to disk

See the relevant part of BashFAQ #100 for information on using parameter expansion for string substitution.