Create JSON using jq from pipe-separated keys and

2020-01-31 01:55发布

问题:

I am trying to create a json object from a string in bash. The string is as follows.

CONTAINER|CPU%|MEMUSAGE/LIMIT|MEM%|NETI/O|BLOCKI/O|PIDS
nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0

The output is from docker stats command and my end goal is to publish custom metrics to aws cloudwatch. I would like to format this string as json.

{
    "CONTAINER":"nginx_container",
    "CPU%":"0.02%", 
    ....
}

I have used jq command before and it seems like it should work well in this case but I have not been able to come up with a good solution yet. Other than hardcoding variable names and indexing using sed or awk. Then creating a json from scratch. Any suggestions would be appreciated. Thanks.

回答1:

Prerequisite

For all of the below, it's assumed that your content is in a shell variable named s:

s='CONTAINER|CPU%|MEMUSAGE/LIMIT|MEM%|NETI/O|BLOCKI/O|PIDS
nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0'

What (modern jq)

# thanks to @JeffMercado and @chepner for refinements, see comments
jq -Rn '
( input  | split("|") ) as $keys |
( inputs | split("|") ) as $vals |
[[$keys, $vals] | transpose[] | {key:.[0],value:.[1]}] | from_entries
' <<<"$s"

How (modern jq)

This requires very new (probably 1.5?) jq to work, and is a dense chunk of code. To break it down:

  • Using -n prevents jq from reading stdin on its own, leaving the entirety of the input stream available to be read by input and inputs -- the former to read a single line, and the latter to read all remaining lines. (-R, for raw input, causes textual lines rather than JSON objects to be read).
  • With [$keys, $vals] | transpose[], we're generating [key, value] pairs (in Python terms, zipping the two lists).
  • With {key:.[0],value:.[1]}, we're making each [key, value] pair into an object of the form {"key": key, "value": value}
  • With from_entries, we're combining those pairs into objects containing those keys and values.

What (shell-assisted)

This will work with a significantly older jq than the above, and is an easily adopted approach for scenarios where a native-jq solution can be harder to wrangle:

{
   IFS='|' read -r -a keys # read first line into an array of strings

   ## read each subsequent line into an array named "values"
   while IFS='|' read -r -a values; do

    # setup: positional arguments to pass in literal variables, query with code    
    jq_args=( )
    jq_query='.'

    # copy values into the arguments, reference them from the generated code    
    for idx in "${!values[@]}"; do
        [[ ${keys[$idx]} ]] || continue # skip values with no corresponding key
        jq_args+=( --arg "key$idx"   "${keys[$idx]}"   )
        jq_args+=( --arg "value$idx" "${values[$idx]}" )
        jq_query+=" | .[\$key${idx}]=\$value${idx}"
    done

    # run the generated command
    jq "${jq_args[@]}" "$jq_query" <<<'{}'
  done
} <<<"$s"

How (shell-assisted)

The invoked jq command from the above is similar to:

jq --arg key0   'CONTAINER' \
   --arg value0 'nginx_container' \
   --arg key1   'CPU%' \
   --arg value1 '0.0.2%' \
   --arg key2   'MEMUSAGE/LIMIT' \
   --arg value2 '25.09MiB/15.26GiB' \
   '. | .[$key0]=$value0 | .[$key1]=$value1 | .[$key2]=$value2' \
   <<<'{}'

...passing each key and value out-of-band (such that it's treated as a literal string rather than parsed as JSON), then referring to them individually.


Result

Either of the above will emit:

{
  "CONTAINER": "nginx_container",
  "CPU%": "0.02%",
  "MEMUSAGE/LIMIT": "25.09MiB/15.26GiB",
  "MEM%": "0.16%",
  "NETI/O": "0B/0B",
  "BLOCKI/O": "22.09MB/4.096kB",
  "PIDS": "0"
}

Why

In short: Because it's guaranteed to generate valid JSON as output.

Consider the following as an example that would break more naive approaches:

s='key ending in a backslash\
value "with quotes"'

Sure, these are unexpected scenarios, but jq knows how to deal with them:

{
  "key ending in a backslash\\": "value \"with quotes\""
}

...whereas an implementation that didn't understand JSON strings could easily end up emitting:

{
  "key ending in a backslash\": "value "with quotes""
}


回答2:

You can ask docker to give you JSON data in the first place

docker stats --format "{{json .}}"

For more on this, see: https://docs.docker.com/config/formatting/



回答3:

I know this is an old post, but the tool you seek is called jo: https://github.com/jpmens/jo

A quick and easy example:

$ jo my_variable="simple"
{"my_variable":"simple"}

A little more complex

$ jo -p name=jo n=17 parser=false
{
  "name": "jo",
  "n": 17,
  "parser": false
}

Add an array

$ jo -p name=jo n=17 parser=false my_array=$(jo -a {1..5})
{
  "name": "jo",
  "n": 17,
  "parser": false,
  "my_array": [
    1,
    2,
    3,
    4,
    5
  ]
}

I've made some pretty complex stuff with jo and the nice thing is that you don't have to worry about rolling your own solution worrying about the possiblity of making invalid json.



回答4:

JSONSTR=""
declare -a JSONNAMES=()
declare -A JSONARRAY=()
LOOPNUM=0

cat ~/newfile | while IFS=: read CONTAINER CPU MEMUSE MEMPC NETIO BLKIO PIDS; do
    if [[ "$LOOPNUM" = 0 ]]; then
        JSONNAMES=("$CONTAINER" "$CPU" "$MEMUSE" "$MEMPC" "$NETIO" "$BLKIO" "$PIDS")
        LOOPNUM=$(( LOOPNUM+1 ))
    else
        echo "{ \"${JSONNAMES[0]}\": \"${CONTAINER}\", \"${JSONNAMES[1]}\": \"${CPU}\", \"${JSONNAMES[2]}\": \"${MEMUSE}\", \"${JSONNAMES[3]}\": \"${MEMPC}\", \"${JSONNAMES[4]}\": \"${NETIO}\", \"${JSONNAMES[5]}\": \"${BLKIO}\", \"${JSONNAMES[6]}\": \"${PIDS}\" }"
    fi 
done

Returns:

{ "CONTAINER": "nginx_container", "CPU%": "0.02%", "MEMUSAGE/LIMIT": "25.09MiB/15.26GiB", "MEM%": "0.16%", "NETI/O": "0B/0B", "BLOCKI/O": "22.09MB/4.096kB", "PIDS": "0" }


回答5:

Here is a solution which uses the -R and -s options along with transpose:

   split("\n")                       # [ "CONTAINER...", "nginx_container|0.02%...", ...]
 | (.[0]    | split("|")) as $keys   # [ "CONTAINER", "CPU%", "MEMUSAGE/LIMIT", ... ]
 | (.[1:][] | split("|"))            # [ "nginx_container", "0.02%", ... ] [ ... ] ...
 | select(length > 0)                # (remove empty [] caused by trailing newline)
 | [$keys, .]                        # [ ["CONTAINER", ...], ["nginx_container", ...] ] ...
 | [ transpose[] | {(.[0]):.[1]} ]   # [ {"CONTAINER": "nginx_container"}, ... ] ...
 | add                               # {"CONTAINER": "nginx_container", "CPU%": "0.02%" ...


回答6:

json_template='{"CONTAINER":"%s","CPU%":"%s","MEMUSAGE/LIMIT":"%s", "MEM%":"%s","NETI/O":"%s","BLOCKI/O":"%s","PIDS":"%s"}' json_string=$(printf "$json_template" "nginx_container" "0.02%" "25.09MiB/15.26GiB" "0.16%" "0B/0B" "22.09MB/4.096kB" "0") echo "$json_string"

Not using jq but possible to use args and environment in values.

CONTAINER=nginx_container json_template='{"CONTAINER":"%s","CPU%":"%s","MEMUSAGE/LIMIT":"%s", "MEM%":"%s","NETI/O":"%s","BLOCKI/O":"%s","PIDS":"%s"}' json_string=$(printf "$json_template" "$CONTAINER" "$1" "25.09MiB/15.26GiB" "0.16%" "0B/0B" "22.09MB/4.096kB" "0") echo "$json_string"



回答7:

If you're starting with tabular data, I think it makes more sense to use something that works with tabular data natively, like sqawk to make it into json, and then use jq work with it further.

echo 'CONTAINER|CPU%|MEMUSAGE/LIMIT|MEM%|NETI/O|BLOCKI/O|PIDS
nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0' \
        | sqawk -FS '[|]' -RS '\n' -output json 'select * from a' header=1 \
        | jq '.[] | with_entries(select(.key|test("^a.*")|not))'

    {
      "CONTAINER": "nginx_container",
      "CPU%": "0.02%",
      "MEMUSAGE/LIMIT": "25.09MiB/15.26GiB",
      "MEM%": "0.16%",
      "NETI/O": "0B/0B",
      "BLOCKI/O": "22.09MB/4.096kB",
      "PIDS": "0"
    }

Without jq, sqawk gives a bit too much:

[
  {
    "anr": "1",
    "anf": "7",
    "a0": "nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0",
    "CONTAINER": "nginx_container",
    "CPU%": "0.02%",
    "MEMUSAGE/LIMIT": "25.09MiB/15.26GiB",
    "MEM%": "0.16%",
    "NETI/O": "0B/0B",
    "BLOCKI/O": "22.09MB/4.096kB",
    "PIDS": "0",
    "a8": "",
    "a9": "",
    "a10": ""
  }
]