Parsing json with awk/sed in bash to get key value

I have read many existing questions at SO but none of them answers what I am looking for. I know it is difficult to parse json in bash using sed/awk but I only need a few key-value pairs per record out of a whole list of key-value pairs per record. I want to do this because it will be faster as the main JSON is pretty big with millions of records.

The JSON format is like following:

{
    "documents":
    [
        {
            "title":"a",   //needed
            "description":"b",  //needed
            "id":"c",  //needed
            ....(some more:not useful)....
            "conversation":
            [
                {
                    "message":"",
                    "id":"d",   //not needed
                    .....(some more)....
                    "createDate":"e",   //not needed
                },
                ...(some more messages)....
            ],
            "createDate":"f",  //needed
            ....(many more labels).....
        }
    ],
    ....(some more global attributes)....
}

Now for this I require attributes which are marked as needed but their common key make it a problem to get by simple sed/awk. Could anyone suggest if we can do it with sed/awk. if possible any help to achieve the same would be appreciated.

P.S.: I know about jsawk but I do not want to introduce any dependency, so if possible please suggest usage of sed/awk.

EDIT: Multiple extries of the format given below(as in document we have a list)

"title":"a",
"description":"b"
"id":"c"
"createDate":"f"

EDIT: The JSON is without any spaces. It has been formated for readability.

标签： json bash shell sed awk

3条回答

Bombasti

2楼-- · 2019-04-02 01:45

I would advise that you use 'jq', or a real JSON parser. You can't "parse" JSON with arbitrary regular expressions. You could hack something with awk, but that will break easily if your input has a form you didn't anticipate.

So, the answer is, introduce a cheap dependency (jq, or similar tool), and script around that. Unless you're running this script in a router or an embedded computer, chances are you can easily install jq.

0人赞添加讨论(0) 举报

爷的心禁止访问

3楼-- · 2019-04-02 01:52

If the key characters [, and {, }, and ] are always isolated in every line this would work:

#!/usr/bin/awk -f

function walk(level, end) {
    while (getline > 0) {
        if (level && $NF ~ end) {
            return
        } 
        if ($NF == "{") {
            walk(level + 1, "},?")
        } else if ($NF == "[") {
            walk(level + 1, "],?")
        } else if (level == 3 && match($0, /"(title|description|id|createDate)":"[^"]*"/)) {
            print substr($0, RSTART, RLENGTH)
        }
    }
}

BEGIN {
    walk(0)
    exit
}

Input:

{
"documents":
[
{
"title":"a",   //needed
"description":"b",  //needed
"id":"c",  //needed
....(some more:not useful)....
"conversation":
[
{
"message":"",
"id":"d",   //not needed
.....(some more)....
"createDate":"e",   //not needed
},
...(some more messages)....
],
"createDate":"f",  //needed
....(many more labels).....
}
],
....(some more global attributes)....
}

Output:

"title":"a"
"description":"b"
"id":"c"
"createDate":"f"

0人赞添加讨论(0) 举报

何必那么认真

4楼-- · 2019-04-02 01:58

Well, if you're going to use a regex to parse JSON, which will by nature be quick, dirty and heavily reliant on the exact syntax of the input file, you could write something that relies on the amount of white space occurring before the key value pairs you're interested in. Depending on the kind of output you're looking for, you could use something along the lines of:

awk '/^ {12}"title/
/^ {12}"description/
/^ {12}"id/
/^ {12}"createDate/' input_file.json

Not great, but it does the trick on your example input...

0人赞添加讨论(0) 举报

Parsing json with awk/sed in bash to get key value

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间