Given the following pandas data frame:
ColA ColB ColC
0 a1 t 1
1 a2 t 2
2 a3 d 3
3 a4 d 4
I want to get a dictionary of dictionary.
But I managed to create the following only:
d = {t : [1, 2], d : [3, 4]}
by:
d = {k: list(v) for k,v in duplicated.groupby("ColB")["ColC"]}
How could I obtain the dict of dict:
dd = {t : {a1:1, a2:2}, d : {a3:3, a4:4}}
You can do this with a groupby
+ apply
step beforehand.
dd = df.set_index('ColA').groupby('ColB').apply(
lambda x: x.ColC.to_dict()
).to_dict()
Or, with a dict comprehension:
dd = {k : g.ColC.to_dict() for k, g in df.set_index('ColA').groupby('ColB')}
print(dd)
{'d': {'a3': 3, 'a4': 4}, 't': {'a1': 1, 'a2': 2}}
The point of this answer is to show that there is a straight forward way to do this with simple iteration and tools from the standard library.
Often times we perform many transformations on a Pandas DataFrame where each transformation invokes the construction of a new Pandas object. At times this can be an intuitive progression and make perfect sense. However, there are times when we forget that we can use simpler tools. I believe this is one of those times. My answer still uses Pandas in that I use the itertuples
method.
from collections import defaultdict
d = defaultdict(dict)
for a, b, c in df.itertuples(index=False):
d[b][a] = c
d = dict(d)
d
{'t': {'a1': 1, 'a2': 2}, 'd': {'a3': 3, 'a4': 4}}
Slight alternative. Since the tuples we are iterating over are named tuples, we can access each element by the name of the column it represents.
from collections import defaultdict
d = defaultdict(dict)
for t in df.itertuples():
d[t.ColB][t.ColA] = t.ColC
d = dict(d)
d
{'t': {'a1': 1, 'a2': 2}, 'd': {'a3': 3, 'a4': 4}}