可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have read many existing questions at SO but none of them answers what I am looking for. I know it is difficult to parse json in bash using sed/awk but I only need a few key-value pairs per record out of a whole list of key-value pairs per record. I want to do this because it will be faster as the main JSON is pretty big with millions of records.

The JSON format is like following:

{
    "documents":
    [
        {
            "title":"a",   //needed
            "description":"b",  //needed
            "id":"c",  //needed
            ....(some more:not useful)....
            "conversation":
            [
                {
                    "message":"",
                    "id":"d",   //not needed
                    .....(some more)....
                    "createDate":"e",   //not needed
                },
                ...(some more messages)....
            ],
            "createDate":"f",  //needed
            ....(many more labels).....
        }
    ],
    ....(some more global attributes)....
}

Now for this I require attributes which are marked as needed but their common key make it a problem to get by simple sed/awk. Could anyone suggest if we can do it with sed/awk. if possible any help to achieve the same would be appreciated.

P.S.: I know about jsawk but I do not want to introduce any dependency, so if possible please suggest usage of sed/awk.

EDIT: Multiple extries of the format given below(as in document we have a list)

"title":"a",
"description":"b"
"id":"c"
"createDate":"f"

EDIT: The JSON is without any spaces. It has been formated for readability.

回答1:

I would advise that you use 'jq', or a real JSON parser. You can't "parse" JSON with arbitrary regular expressions. You could hack something with awk, but that will break easily if your input has a form you didn't anticipate.

So, the answer is, introduce a cheap dependency (jq, or similar tool), and script around that. Unless you're running this script in a router or an embedded computer, chances are you can easily install jq.

回答2:

If the key characters [, and {, }, and ] are always isolated in every line this would work:

#!/usr/bin/awk -f

function walk(level, end) {
    while (getline > 0) {
        if (level && $NF ~ end) {
            return
        } 
        if ($NF == "{") {
            walk(level + 1, "},?")
        } else if ($NF == "[") {
            walk(level + 1, "],?")
        } else if (level == 3 && match($0, /"(title|description|id|createDate)":"[^"]*"/)) {
            print substr($0, RSTART, RLENGTH)
        }
    }
}

BEGIN {
    walk(0)
    exit
}

Input:

{
"documents":
[
{
"title":"a",   //needed
"description":"b",  //needed
"id":"c",  //needed
....(some more:not useful)....
"conversation":
[
{
"message":"",
"id":"d",   //not needed
.....(some more)....
"createDate":"e",   //not needed
},
...(some more messages)....
],
"createDate":"f",  //needed
....(many more labels).....
}
],
....(some more global attributes)....
}

Output:

"title":"a"
"description":"b"
"id":"c"
"createDate":"f"

回答3:

Well, if you're going to use a regex to parse JSON, which will by nature be quick, dirty and heavily reliant on the exact syntax of the input file, you could write something that relies on the amount of white space occurring before the key value pairs you're interested in. Depending on the kind of output you're looking for, you could use something along the lines of:

awk '/^ {12}"title/
/^ {12}"description/
/^ {12}"id/
/^ {12}"createDate/' input_file.json

Not great, but it does the trick on your example input...

Parsing json with awk/sed in bash to get key value

问题:

回答1:

回答2:

回答3:

收藏的人(0)

Parsing json with awk/sed in bash to get key value

问题:

回答1:

回答2:

回答3:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮