I have configuration files where each line contains assignments separated by semi-colons. Something like this, which mimics normal shell assignments :
VAR1="1" ; VAR2="2"
VAR1="3" ; VAR2="4"
Each line contains the same variables, and is intended to be processed individually. These configuration files are all under the system administrator control, so using eval
to perform the assignment is not too bad for now. But I would like to extend this to per-user config files, and I am looking for better ideas.
I am able to parse a line, split it in chunks using ;
as a separator (in a way that unfortunately does not allow escaped ;
to be found inside the values, but I can live with that), identify the assignment (valid variable name followed by =
sign), and extract the right part of the assignment (in raw form, with quoting and spacing as part of the value). But then I have a problem.
Say I have variable value
which, after the parsing, contains what would result from a "manual" assignment like this :
value="\"Arbitrary value \\\" containing escaped quote inside quotes\""
In other words, the value is this (if I echo "$value"
) :
"Arbitrary value \" containing escaped quote inside quotes"
I want to transform that value without using eval
or another method that could cause arbitrary code execution (and therefore code injection risks) so that it becomes this:
Arbitrary value " containing escaped quote inside quotes
I could, I guess, just look for and remove leading and trailing quotes, but this does not handle all cases of valid shell quoting. If there is a way to retain safe expansions while preventing code execution, that is a plus, but I am not getting my hopes up with this one. I would also prefer a Bash-only solution (no external program called), but this is a preference, not a hard requirement.
If I solve that issue, I know how to perform the indirect assignment safely, and I do not need detailed code on how to read files, perform regex matching, etc. It is only this critical step I am missing, and I hope there is a way that does not involve writing a parser.
One very easy solution is to use
jq
. Since "foo is a string \" that contains a quote" is valid json, it handles it natively:Yes, it's not native sh or bash, but it's a quick and easy solution. Furthermore, jq has methods to output the result back to a format that can be read in by another shell:
To complement kojiro's helpful
jq
solution with a purebash
solution (a POSIX-compliant implementation is also possible):Running
printf '%s\n' "$value"
afterward yields:Note:
If
$value
contained a\
followed by an actual newline (probably not a concern with configuration-file entries), that newline would be removed.For any other
\
-prefixed character - not just\"
- (only) the\
is removed.No expansions of any kind are performed, and other string formats that the shell supports aren't supported (such as automatic concatenation of adjacent strings
"ab""cd"
to yieldabcd
).Optional background information
read
- without the-r
option - interprets\
-based sequences only in the sense that, with the exception discussed below, it removes the\
before a\<char>
sequence; it does not perform expansion of control-character escape sequences such as\n
.The only expansion of sorts
read
does perform is if a\
is followed by an actual newline (LF character), in which the newline is removed too, which points to the main purpose of\
-escaping forread
: line continuation.From the POSIX spec:
The
-r
option turns interpretation of\
sequences off, which is the desired behavior in the vast majority of cases.Therefore, it is advisable to use
-r
routinely, unless you explicitly need processing of\
sequences.