I'm trying to parse a JSON object within a shell script into an array.
e.g.: [Amanda, 25, http://mywebsite.com]
The JSON looks like:
{
"name" : "Amanda",
"age" : "25",
"websiteurl" : "http://mywebsite.com"
}
I do not want to use any libraries, it would be best if I could use a regular expression or grep. I have done:
myfile.json | grep name
This gives me "name" : "Amanda". I could do this in a loop for each line in the file, and add it to an array but I only need the right side and not the entire line.
If you really cannot use a proper JSON parser such as jq
[1]
, try an awk
-based solution:
Bash 4.x:
readarray -t values < <(awk -F\" 'NF>=3 {print $4}' myfile.json)
Bash 3.x:
IFS=$'\n' read -d '' -ra values < <(awk -F\" 'NF>=3 {print $4}' myfile.json)
This stores all property values in Bash array ${values[@]}
, which you can inspect with
declare -p values
.
These solutions have limitations:
- each property must be on its own line,
- all values must be double-quoted,
- embedded escaped double quotes are not supported.
All these limitations reinforce the recommendation to use a proper JSON parser.
Note: The following alternative solutions use the Bash 4.x+ readarray -t values
command, but they also work with the Bash 3.x alternative, IFS=$'\n' read -d '' -ra values
.
grep
+ cut
combination: A single grep
command won't do (unless you use GNU grep
- see below), but adding cut
helps:
readarray -t values < <(grep '"' myfile.json | cut -d '"' -f4)
GNU grep
: Using -P
to support PCREs, which support \K
to drop everything matched so far (a more flexible alternative to a look-behind assertion) as well as look-ahead assertions ((?=...)
):
readarray -t values < <(grep -Po ':\s*"\K.+(?="\s*,?\s*$)' myfile.json)
Finally, here's a pure Bash (3.x+) solution:
What makes this a viable alternative in terms of performance is that no external utilities are called in each loop iteration; however, for larger input files, a solution based on external utilities will be much faster.
#!/usr/bin/env bash
declare -a values # declare the array
# Read each line and use regex parsing (with Bash's `=~` operator)
# to extract the value.
while read -r line; do
# Extract the value from between the double quotes
# and add it to the array.
[[ $line =~ :[[:blank:]]+\"(.*)\" ]] && values+=( "${BASH_REMATCH[1]}" )
done < myfile.json
declare -p values # print the array
[1] Here's what a robust jq
-based solution would look like (Bash 4.x):
readarray -t values < <(jq -r '.[]' myfile.json)
jq is good enough to solve this problem
paste -s <(jq '.files[].name' YourJsonString) <(jq '.files[].age' YourJsonString) <( jq '.files[].websiteurl' YourJsonString)
So that you get a table and you can grep any rows or awk print any columns you want
You can use a sed one liner to achieve this:
array=( $(sed -n "/{/,/}/{s/[^:]*:[[:blank:]]*//p;}" json ) )
Result:
$ echo ${array[@]}
"Amanda" "25" "http://mywebsite.com"
If you do not need/want the quotation marks then the following sed will do away with them:
array=( $(sed -n '/{/,/}/{s/[^:]*:[^"]*"\([^"]*\).*/\1/p;}' json) )
Result:
$ echo ${array[@]}
Amanda 25 http://mywebsite.com
It will also work if you have multiple entries, like
$ cat json
{
"name" : "Amanda"
"age" : "25"
"websiteurl" : "http://mywebsite.com"
}
{
"name" : "samantha"
"age" : "31"
"websiteurl" : "http://anotherwebsite.org"
}
$ echo ${array[@]}
Amanda 25 http://mywebsite.com samantha 31 http://anotherwebsite.org
UPDATE:
As pointed out by mklement0 in the comments, there might be an issue if the file contains embedded whitespace, e.g., "name" : "Amanda lastname"
. In this case Amanda
and lastname
would both be read into seperate array fields each. To avoid this you can use readarray
, e.g.,
readarray -t array < <(sed -n '/{/,/}/{s/[^:]*:[^"]*"\([^"]*\).*/\1/p;}' json2)
This will also take care of any globbing issues, also mentioned in the comments.