Pandas dataframe to dict of dict

2019-03-25 07:31发布

问题:

Given the following pandas data frame:

  ColA ColB  ColC
0   a1    t     1
1   a2    t     2
2   a3    d     3
3   a4    d     4

I want to get a dictionary of dictionary.

But I managed to create the following only:

d = {t : [1, 2], d : [3, 4]}

by:

d = {k: list(v) for k,v in duplicated.groupby("ColB")["ColC"]}

How could I obtain the dict of dict:

dd = {t : {a1:1, a2:2}, d : {a3:3, a4:4}}

回答1:

You can do this with a groupby + apply step beforehand.

dd = df.set_index('ColA').groupby('ColB').apply(
    lambda x: x.ColC.to_dict()
).to_dict()

Or, with a dict comprehension:

dd = {k : g.ColC.to_dict() for k, g in df.set_index('ColA').groupby('ColB')}

print(dd)
{'d': {'a3': 3, 'a4': 4}, 't': {'a1': 1, 'a2': 2}}


回答2:

The point of this answer is to show that there is a straight forward way to do this with simple iteration and tools from the standard library.

Often times we perform many transformations on a Pandas DataFrame where each transformation invokes the construction of a new Pandas object. At times this can be an intuitive progression and make perfect sense. However, there are times when we forget that we can use simpler tools. I believe this is one of those times. My answer still uses Pandas in that I use the itertuples method.

from collections import defaultdict

d = defaultdict(dict)

for a, b, c in df.itertuples(index=False):
    d[b][a] = c

d = dict(d)

d

{'t': {'a1': 1, 'a2': 2}, 'd': {'a3': 3, 'a4': 4}}

Slight alternative. Since the tuples we are iterating over are named tuples, we can access each element by the name of the column it represents.

from collections import defaultdict

d = defaultdict(dict)

for t in df.itertuples():
    d[t.ColB][t.ColA] = t.ColC

d = dict(d)

d

{'t': {'a1': 1, 'a2': 2}, 'd': {'a3': 3, 'a4': 4}}