Which characters need to be escaped when using Bas

2019-01-02 22:31发布

Is there any comprehensive list of characters that need to be escaped in Bash? Can it be checked just with sed?

In particular, I was checking whether % needs to be escaped or not. I tried

echo "h%h" | sed 's/%/i/g'

and worked fine, without escaping %. Does it mean % does not need to be escaped? Was this a good way to check the necessity?

And more general: are they the same characters to escape in shell and bash?

7条回答
一纸荒年 Trace。
2楼-- · 2019-01-02 22:45

Characters that need escaping are different in Bourne or POSIX shell than Bash. Generally (very) Bash is a superset of those shells, so anything you escape in shell should be escaped in Bash.

A nice general rule would be "if in doubt, escape it". But escaping some characters gives them a special meaning, like \n. These are listed in the man bash pages under Quoting and echo.

Other than that, escape any character that is not alphanumeric, it is safer. I don't know of a single definitive list.

The man pages list them all somewhere, but not in one place. Learn the language, that is the way to be sure.

One that has caught me out is !. This is a special character (history expansion) in Bash (and csh) but not in Korn shell. Even echo "Hello world!" gives problems. Using single-quotes, as usual, removes the special meaning.

查看更多
家丑人穷心不美
3楼-- · 2019-01-02 22:49

Using the print '%q' technique, we can run a loop to find out which characters are special:

#!/bin/bash
special=$'`!@#$%^&*()-_+={}|[]\\;\':",.<>?/ '
for ((i=0; i < ${#special}; i++)); do
    char="${special:i:1}"
    printf -v q_char '%q' "$char"
    if [[ "$char" != "$q_char" ]]; then
        printf 'Yes - character %s needs to be escaped\n' "$char"
    else
        printf 'No - character %s does not need to be escaped\n' "$char"
    fi
done | sort

It gives this output:

No, character % does not need to be escaped
No, character + does not need to be escaped
No, character - does not need to be escaped
No, character . does not need to be escaped
No, character / does not need to be escaped
No, character : does not need to be escaped
No, character = does not need to be escaped
No, character @ does not need to be escaped
No, character _ does not need to be escaped
Yes, character   needs to be escaped
Yes, character ! needs to be escaped
Yes, character " needs to be escaped
Yes, character # needs to be escaped
Yes, character $ needs to be escaped
Yes, character & needs to be escaped
Yes, character ' needs to be escaped
Yes, character ( needs to be escaped
Yes, character ) needs to be escaped
Yes, character * needs to be escaped
Yes, character , needs to be escaped
Yes, character ; needs to be escaped
Yes, character < needs to be escaped
Yes, character > needs to be escaped
Yes, character ? needs to be escaped
Yes, character [ needs to be escaped
Yes, character \ needs to be escaped
Yes, character ] needs to be escaped
Yes, character ^ needs to be escaped
Yes, character ` needs to be escaped
Yes, character { needs to be escaped
Yes, character | needs to be escaped
Yes, character } needs to be escaped

Some of the results, like , look a little suspicious. Would be interesting to get @CharlesDuffy's inputs on this.

查看更多
手持菜刀,她持情操
4楼-- · 2019-01-02 22:57

format that can be reused as shell input

There is a special printf format directive (%q) built for this kind of request:

printf [-v var] format [arguments]

 %q     causes printf to output the corresponding argument
        in a format that can be reused as shell input.

Some samples:

read foo
Hello world
printf "%q\n" "$foo"
Hello\ world

printf "%q\n" $'Hello world!\n'
$'Hello world!\n'

This could be used through variables too:

printf -v var "%q" "$foo
"
echo "$var"
$'Hello world\n'

Quick check with all (128) ascii bytes:

Note that all bytes from 128 to 255 have to be escaped.

for i in {0..127} ;do
    printf -v var \\%o $i
    printf -v var $var
    printf -v res "%q" "$var"
    esc=E
    [ "$var" = "$res" ] && esc=-
    printf "%02X %s %-7s\n" $i $esc "$res"
done |
    column

This must render something like:

00 E ''         1A E $'\032'    34 - 4          4E - N          68 - h      
01 E $'\001'    1B E $'\E'      35 - 5          4F - O          69 - i      
02 E $'\002'    1C E $'\034'    36 - 6          50 - P          6A - j      
03 E $'\003'    1D E $'\035'    37 - 7          51 - Q          6B - k      
04 E $'\004'    1E E $'\036'    38 - 8          52 - R          6C - l      
05 E $'\005'    1F E $'\037'    39 - 9          53 - S          6D - m      
06 E $'\006'    20 E \          3A - :          54 - T          6E - n      
07 E $'\a'      21 E \!         3B E \;         55 - U          6F - o      
08 E $'\b'      22 E \"         3C E \<         56 - V          70 - p      
09 E $'\t'      23 E \#         3D - =          57 - W          71 - q      
0A E $'\n'      24 E \$         3E E \>         58 - X          72 - r      
0B E $'\v'      25 - %          3F E \?         59 - Y          73 - s      
0C E $'\f'      26 E \&         40 - @          5A - Z          74 - t      
0D E $'\r'      27 E \'         41 - A          5B E \[         75 - u      
0E E $'\016'    28 E \(         42 - B          5C E \\         76 - v      
0F E $'\017'    29 E \)         43 - C          5D E \]         77 - w      
10 E $'\020'    2A E \*         44 - D          5E E \^         78 - x      
11 E $'\021'    2B - +          45 - E          5F - _          79 - y      
12 E $'\022'    2C E \,         46 - F          60 E \`         7A - z      
13 E $'\023'    2D - -          47 - G          61 - a          7B E \{     
14 E $'\024'    2E - .          48 - H          62 - b          7C E \|     
15 E $'\025'    2F - /          49 - I          63 - c          7D E \}     
16 E $'\026'    30 - 0          4A - J          64 - d          7E E \~     
17 E $'\027'    31 - 1          4B - K          65 - e          7F E $'\177'
18 E $'\030'    32 - 2          4C - L          66 - f      
19 E $'\031'    33 - 3          4D - M          67 - g      

Where first field is hexa value of byte, second contain E if character need to be escaped and third field show escaped presentation of character.

Why ,?

You could see some characters that don't always need to be escaped, like ,, } and {.

So not always but sometime:

echo test 1, 2, 3 and 4,5.
test 1, 2, 3 and 4,5.

or

echo test { 1, 2, 3 }
test { 1, 2, 3 }

but care:

echo test{1,2,3}
test1 test2 test3

echo test\ {1,2,3}
test 1 test 2 test 3

echo test\ {\ 1,\ 2,\ 3\ }
test  1 test  2 test  3

echo test\ {\ 1\,\ 2,\ 3\ }
test  1, 2 test  3 
查看更多
Viruses.
5楼-- · 2019-01-02 23:01

There are two easy and safe rules which work not only in sh but also bash.

1. Put the whole string in single quotes

This works for all chars except single quote itself. To escape the single quote, close the quoting before it, insert the single quote, and re-open the quoting.

'I'\''m a s@fe $tring which ends in newline
'

sed command: sed -e "s/'/'\\\\''/g; 1s/^/'/; \$s/\$/'/"

2. Escape every char with a backslash

This works for all characters except newline. For newline characters use single or double quotes. Empty strings must still be handled - replace with ""

\I\'\m\ \a\ \s\@\f\e\ \$\t\r\i\n\g\ \w\h\i\c\h\ \e\n\d\s\ \i\n\ \n\e\w\l\i\n\e"
"

sed command: sed -e 's/./\\&/g; 1{$s/^$/""/}; 1!s/^/"/; $!s/$/"/'.

2b. More readable version of 2

There's an easy safe set of characters, like [a-zA-Z0-9,._+:@%/-], which can be left unescaped to keep it more readable

I\'m\ a\ s@fe\ \$tring\ which\ ends\ in\ newline"
"

sed command: LC_ALL=C sed -e 's/[^a-zA-Z0-9,._+@%/-]/\\&/g; 1{$s/^$/""/}; 1!s/^/"/; $!s/$/"/'.


Note that in a sed program, one can't know whether the last line of input ends with a newline byte (except when it's empty). That's why both above sed commands assume it does not. You can add a quoted newline manually.

Note that shell variables are only defined for text in the POSIX sense. Processing binary data is not defined. For the implementations that matter, binary works with the exception of NUL bytes (because variables are implemented with C strings, and meant to be used as C strings, namely program arguments), but you should switch to a "binary" locale such as latin1.


(You can easily validate the rules by reading the POSIX spec for sh. For bash, check the reference manual linked by @AustinPhillips)

查看更多
爷的心禁止访问
6楼-- · 2019-01-02 23:03

To save someone else from having to RTFM... in bash:

Enclosing characters in double quotes preserves the literal value of all characters within the quotes, with the exception of $, `, \, and, when history expansion is enabled, !.

...so if you escape those (and the quote itself, of course) you're probably okay.

If you take a more conservative 'when in doubt, escape it' approach, it should be possible to avoid getting instead characters with special meaning by not escaping identifier characters (i.e. ASCII letters, numbers, or '_'). It's very unlikely these would ever (i.e. in some weird POSIX-ish shell) have special meaning and thus need to be escaped.

查看更多
Evening l夕情丶
7楼-- · 2019-01-02 23:03

I presume that you're talking about bash strings. There are different types of strings which have a different set of requirements for escaping. eg. Single quotes strings are different from double quoted strings.

The best reference is the Quoting section of the bash manual.

It explains which characters needs escaping. Note that some characters may need escaping depending on which options are enabled such as history expansion.

查看更多
登录 后发表回答