Pandas pivot with specified (additional) columns

2019-07-21 04:57发布

问题:

I have a two different list of tuples that are converted into a pandas dataframe:

ls1 = [(1,"A",2),(1,"B",1),(1,"C",3),(2,"A",4),(2,"B",4,),(2,"C",5)]
ls2 = [(1,"A",2),(1,"C",3),(1,"B",1),(1,"D",6),(2,"A",4),(2,"C",5),(2,"B",4,)]
df1 = pandas.DataFrame(ls1, columns=['ID', 'Class', 'count'])
df2 = pandas.DataFrame(ls2, columns=['ID', 'Class', 'count'])

Now I'd like to make a pivot table from both dataframes with the column names for the classes "A", "B", "C", "D". So, all four column names (if possible also in a specified order) should exist in the resulting pivot table. If there are no counts for a ID-Class combination these should be filled with e.g. NaN.

dfpivot1 = df1.pivot(index='ID', columns='Class', values='count')
dfpivot2 = df2.pivot(index='ID', columns='Class', values='count')

>>> dfpivot1
Class  A  B  C
ID            
1      2  1  3
2      4  4  5
>>> 

Using the .pivot from pandas provides a pivot table for df1, but with only three Class-columns ("A","B","C"). So, dfpivot1 needs to be modified to have the columns "A","B","C" AND "D" and thus, exactly match the columns of dfpivot2. As I am starting from a list of tuples also other approaches (without using pandas) might be interesting.

回答1:

I think you need add reindex_axis or reindex for fill NaNs for all missing columns:

cols = ['A','B','C','D']
dfpivot1 = df1.pivot(index='ID', columns='Class', values='count').reindex_axis(cols, axis=1)
print (dfpivot1)
Class  A  B  C   D
ID                
1      2  1  3 NaN
2      4  4  5 NaN

Also is possible specify fill_value parameter:

cols = ['A','B','C','D']
dfpivot1 = df1.pivot(index='ID', columns='Class', values='count')
              .reindex_axis(cols, fill_value=0, axis=1)
print (dfpivot1)
Class  A  B  C  D
ID               
1      2  1  3  0
2      4  4  5  0

cols = ['A','B','C','D']
dfpivot1 = df1.pivot(index='ID', columns='Class', values='count').reindex(columns=cols)
print (dfpivot1)
Class  A  B  C   D
ID                
1      2  1  3 NaN
2      4  4  5 NaN