Spark scala - update/add new column in Json object

2019-09-20 00:04发布

问题:

I want to update an array of objects within an existing json object with the content from another json object.

Initial object:

{
    "user": "gT35Hhhre9m",
    "date": "2016-01-29",
    "status": "OK",
    "reason": "some reason",
    "content": [
        {
            "foo": 123,
            "bar": "val1"
        }
    ]
}

Supplementary object:

{
    "id": "gT35Hhhre9m"
}

Post-merge object structure:

{
    "user": "gT35Hhhre9m",
    "date": "2016-01-29",
    "status": "OK",
    "reason": "some reason",
    "content": [{
        "foo": 123,
        "bar": "val1"
        "id": "gT35Hhhre9m"
    }]
}

回答1:

  1. Flatten the "Initial object" and treat Spark dataframes as columnar data similar to a SQL table.
  2. Complete transformations
  3. Convert back to Spark dataframes as JSON.

Not thinking dataframe as JSON is the trick.