How to pivot a dataframe with pandas correctly

2019-08-20 12:16发布

I have a dataframe df1 that looks like this

                     rootID   parentID    jobID  time
                  0    A         A          B    2019-01-30 14:33:21.339469
                  1    A         A          C    2019-01-30 14:33:21.812381
                  2    A         C          D    2019-01-30 15:33:21.812381
                  3    E         E          F    2019-01-30 15:33:21.812381
                  4    E         F          G    2019-01-30 16:33:21.812381
                  5    E         F          H    2019-01-30 17:33:21.812381
                  6    E         G          I    2019-01-30 18:33:21.812381

I want to pivot this dataframe to the following form (df2)

                     rootID   subID1      subID2   subID3  #subFlows
                  0    A         B                             1
                  1    A         C          D                  2  
                  3    E         F          G         I        3
                  4    E         F          H                  2

I have tried

         df2 = (df1.assign(g=df.groupby('rootID').cumcount().add(1))
               .pivot('rootID','g','jobID')
               .add_prefix('subID')
               .fillna("")
               .reset_index())

          df2['#subFlows'] = (df2 != "").sum(axis=1).astype(int).sub(1)

Which returns a dataframe like

                     rootID   subID1      subID2   subID3 
                  0    A         B          C        D    
                  1    E         F          G        H    

But I want, as described above, to seperate the non nested subIDs.

Any idea how I would do this?

0条回答
登录 后发表回答