Reading assignments from configuration files

2019-07-22 20:45发布

I have configuration files where each line contains assignments separated by semi-colons. Something like this, which mimics normal shell assignments :

VAR1="1"  ;  VAR2="2"
VAR1="3"  ;  VAR2="4"

Each line contains the same variables, and is intended to be processed individually. These configuration files are all under the system administrator control, so using eval to perform the assignment is not too bad for now. But I would like to extend this to per-user config files, and I am looking for better ideas.

I am able to parse a line, split it in chunks using ; as a separator (in a way that unfortunately does not allow escaped ; to be found inside the values, but I can live with that), identify the assignment (valid variable name followed by = sign), and extract the right part of the assignment (in raw form, with quoting and spacing as part of the value). But then I have a problem.

Say I have variable value which, after the parsing, contains what would result from a "manual" assignment like this :

value="\"Arbitrary value \\\" containing escaped quote inside quotes\""

In other words, the value is this (if I echo "$value") :

"Arbitrary value \" containing escaped quote inside quotes"

I want to transform that value without using eval or another method that could cause arbitrary code execution (and therefore code injection risks) so that it becomes this:

Arbitrary value " containing escaped quote inside quotes

I could, I guess, just look for and remove leading and trailing quotes, but this does not handle all cases of valid shell quoting. If there is a way to retain safe expansions while preventing code execution, that is a plus, but I am not getting my hopes up with this one. I would also prefer a Bash-only solution (no external program called), but this is a preference, not a hard requirement.

If I solve that issue, I know how to perform the indirect assignment safely, and I do not need detailed code on how to read files, perform regex matching, etc. It is only this critical step I am missing, and I hope there is a way that does not involve writing a parser.

标签: bash shell
2条回答
在下西门庆
2楼-- · 2019-07-22 21:06

One very easy solution is to use jq. Since "foo is a string \" that contains a quote" is valid json, it handles it natively:

$ value="\"Arbitrary value \\\" containing escaped quote inside quotes\""
$ jq -r . <<< "$value"
Arbitrary value " containing escaped quote inside quotes

Yes, it's not native sh or bash, but it's a quick and easy solution. Furthermore, jq has methods to output the result back to a format that can be read in by another shell:

$ jq -r '.|@sh' <<< "$value"
'Arbitrary value " containing escaped quote inside quotes'
查看更多
混吃等死
3楼-- · 2019-07-22 21:16

To complement kojiro's helpful jq solution with a pure bash solution (a POSIX-compliant implementation is also possible):

# Sample value, resulting in the following value, *including* the double quotes:
#     "Arbitrary value \" containing escaped quote inside quotes"
# Note: This is effectively the same assignment as in the question, except
#       with single quotes, which makes it easier to parse visually.
value='"Arbitrary value \" containing escaped quote inside quotes"'    

# Strip enclosing " instances, if present.
[[ $value =~ ^\"(.*)\"$ ]] && value=${BASH_REMATCH[1]}

# Use `read` - without -r - to perform interpretation of \-prefixed
# escape sequences, and save the result back to $value.
IFS= read value <<<"$value"

Running printf '%s\n' "$value" afterward yields:

Arbitrary value " containing escaped quote inside quotes

Note:

  • If $value contained a \ followed by an actual newline (probably not a concern with configuration-file entries), that newline would be removed.

  • For any other \-prefixed character - not just \" - (only) the \ is removed.

  • No expansions of any kind are performed, and other string formats that the shell supports aren't supported (such as automatic concatenation of adjacent strings "ab""cd" to yield abcd).

    • See this answer of mine for a safe templating solution that restricts expansions to embedded variable references (prevents command substitutions).

Optional background information

read - without the -r option - interprets \-based sequences only in the sense that, with the exception discussed below, it removes the \ before a \<char> sequence; it does not perform expansion of control-character escape sequences such as \n.

The only expansion of sorts read does perform is if a \ is followed by an actual newline (LF character), in which the newline is removed too, which points to the main purpose of \-escaping for read: line continuation.
From the POSIX spec:

By default, unless the -r option is specified, <backslash> shall act as an escape character. An unescaped <backslash> shall preserve the literal value of the following character, with the exception of a <newline>. If a <newline> follows the <backslash>, the read utility shall interpret this as line continuation. The <backslash> and <newline> shall be removed before splitting the input into fields. All other unescaped <backslash> characters shall be removed after splitting the input into fields.

The -r option turns interpretation of \ sequences off, which is the desired behavior in the vast majority of cases.
Therefore, it is advisable to use -r routinely, unless you explicitly need processing of \ sequences.

查看更多
登录 后发表回答