Pandas remove null values when to_json

2020-06-09 04:23发布

问题:

i have actually a pandas dataframe and i want to save it to json format. From the pandas docs it says:

Note NaN‘s, NaT‘s and None will be converted to null and datetime objects will be converted based on the date_format and date_unit parameters

Then using the orient option records i have something like this

[{"A":1,"B":4,"C":7},{"A":null,"B":5,"C":null},{"A":3,"B":null,"C":null}]

Is it possible to have this instead:

[{"A":1,"B":4,"C":7},{"B":5},{"A":3}]'

Thank you

回答1:

The following gets close to what you want, essentially we create a list of the non-NaN values and then call to_json on this:

In [136]:
df.apply(lambda x: [x.dropna()], axis=1).to_json()

Out[136]:
'{"0":[{"a":1.0,"b":4.0,"c":7.0}],"1":[{"b":5.0}],"2":[{"a":3.0}]}'

creating a list is necessary here otherwise it will try to align the result with your original df shape and this will reintroduce the NaN values which is what you want to avoid:

In [138]:
df.apply(lambda x: pd.Series(x.dropna()), axis=1).to_json()

Out[138]:
'{"a":{"0":1.0,"1":null,"2":3.0},"b":{"0":4.0,"1":5.0,"2":null},"c":{"0":7.0,"1":null,"2":null}}'

also calling list on the result of dropna will broadcast the result with the shape, like filling:

In [137]:
df.apply(lambda x: list(x.dropna()), axis=1).to_json()

Out[137]:
'{"a":{"0":1.0,"1":5.0,"2":3.0},"b":{"0":4.0,"1":5.0,"2":3.0},"c":{"0":7.0,"1":5.0,"2":3.0}}'


回答2:

The solution above doesn't actually produce results in the 'records' format. This solution also uses the json package, but produces exactly the result asked for in the original question.

import pandas as pd
import json

json.dumps([row.dropna().to_dict() for index,row in df.iterrows()])

Additionally, if you want to include the index (and you are on Python 3.5+) you can do:

json.dumps([{'index':index, **row.dropna().to_dict()} for index,row in df.iterrows()])


回答3:

I got the same problem and my solution is use json module instead of pd.DataFrame.to_json()

My solution is

  1. drop the NaN value when converting DataFrame to dict, and then
  2. convert dict to json using json.dumps()

Here is the code:

import pandas as pd
import json
from pandas import compat

def to_dict_dropna(df):
   return {int(k): v.dropna().astype(int).to_dict() for k, v in compat.iteritems(df)}

json.dumps(to_dict_dropna(df))