I read in a DataFrame with a huge file holding on each line of it a JSON object as follows:
{
"userId": "12345",
"vars": {
"test_group": "group1",
"brand": "xband"
},
"modules": [
{
"id": "New"
},
{
"id": "Default"
},
{
"id": "BestValue"
},
{
"id": "Rating"
},
{
"id": "DeliveryMin"
},
{
"id": "Distance"
}
]
}
How could I manipulate in such way the DataFrame, to keep only the module with id="Default" ? How to just delete all the other, if id does not equal "Default"?
As you said you have
json
format given in question in each line asIf thats true then you can use
sqlContext
'sjson
api to read thejson
file todataframe
as belowwhich should give you
dataframe
asand
schema
beFinal step would be to
filter
only themodules.id
withDefault
as valuewhich should give you
I hope the answer is helpful
Updated
this would create
json
asBut if your requirement is to get as below
You should be exploding the
modules
and notmodules.id