I have a list of dictionaries like this:
[{'points': 50, 'time': '5:00', 'year': 2010},
{'points': 25, 'time': '6:00', 'month': "february"},
{'points':90, 'time': '9:00', 'month': 'january'},
{'points_h1':20, 'month': 'june'}]
And I want to turn this into a pandas DataFrame
like this:
month points points_h1 time year
0 NaN 50 NaN 5:00 2010
1 february 25 NaN 6:00 NaN
2 january 90 NaN 9:00 NaN
3 june NaN 20 NaN NaN
Note: Order of the columns does not matter.
How can I turn the list of dictionaries into a pandas DataFrame as shown above?
You can also use
pd.DataFrame.from_dict(d)
as :In pandas 16.2, I had to do
pd.DataFrame.from_records(d)
to get this to work.pd.DataFrame
,pd.DataFrame.from_records
, andpd.DataFrame.from_dict
In this section, I will demonstrate examples where all these 3 methods work in an identical fashion, where some of them work better than others, and where some of them don't work at all.
Consider a very contrived example.
This list consists of "records" with every keys present. This is the simplest case you could encounter.
Word on Dictionary Orientations
Before continuing, it is important to make the distinction between the different types of dictionary orientations, and support with pandas. There are two primary types: "columns", and "index".
orient='columns'
Dictionaries with the "columns" orientation will have their keys correspond to columns in the equivalent DataFrame.
For example,
data
above is in the "columns" orient.Note: If you are using
pd.DataFrame.from_records
, the orientation is assumed to be "columns" (you cannot specify otherwise), and the dictionaries will be loaded accordingly.orient='index'
With this orient, keys are assumed to correspond to index values. This kind of data is best suited for
pd.DataFrame.from_dict
.This case is not considered in the OP, but is still useful to know.
Setting a Custom Index
If you need a custom index on the resultant DataFrame, you can set it using the
index=...
argument.This is not supported by
pd.DataFrame.from_dict
.Dealing with Missing Keys/Columns
All methods work out-of-the-box when handling dictionaries with missing keys/column values. For example,
Reading a Subset of Columns
"What if I don't want to read in every single column"? You can easily specify this using the
columns=...
parameter.For example, from the example dictionary of
data2
above, if you wanted to read only columns "A', 'D', and 'F', you can do so by passing a list:This is not supported by
pd.DataFrame.from_dict
with the default orient "columns".Reading a Subset of Rows
Not supported by any of these methods directly. You will have to iterate over your data and perform a reverse delete in-place as you iterate. For example, to extract only the 0th and 2nd rows from
data2
above, you can use:The panacea:
json_normalize
A strong, robust alternative to the methods outlined above is the
json_normalize
function which works with lists of dictionaries (records), and in addition can also handle nested dictionaries.Again, keep in mind that the data passed to
json_normalize
needs to be in the list-of-dictionaries (records) format.As mentioned,
json_normalize
can also handle nested dictionaries. Here's an example taken from the documentation.For more information on the
meta
andrecord_path
arguments, check out the documentation.Summarising
Here's a table of all the methods discussed above, along with supported features/functionality.
Supposing
d
is your list of dicts, simply: