pandas: concatenate two DataFrames with sorted Mul

2019-06-02 21:51发布

问题:

Please, let me know how to concatenate two DataFrames with sorted MultiIndexes such that result has a sorted MultiIndex.

Since, both are sorted, the algorithm has to have linear complexity in terms of the total number of rows in both DataFrames (this is the complexity of merging 2 sorted lists, which is effectively what the problem is here).

Example:

import pandas as pd
t1 = pd.DataFrame(data={'i1':[0,0,1,1,2,2],
                        'i2':[0,1,0,1,0,1],
                        'x':[1.,2.,3.,4.,5.,6.]})
t1.set_index(['i1','i2'], inplace=True)
t1.sort_index(inplace=True)
t2 = pd.DataFrame(data={'i1':[0,0,1,1,2,2],
                        'i2':[2,3,2,3,2,3],
                        'x':[7.,8.,9.,10.,11.,12.]})
t2.set_index(['i1','i2'], inplace=True)
t2.sort_index(inplace=True)
>>> print(t1)
         x
i1 i2     
0  0   1.0
   1   2.0
1  0   3.0
   1   4.0
2  0   5.0
   1   6.0

>>> print(t2)
          x
i1 i2      
0  2    7.0
   3    8.0
1  2    9.0
   3   10.0
2  2   11.0
   3   12.0

Expected result:

          x
i1 i2      
0  0    1.0
   1    2.0
   2    7.0
   3    8.0
1  0    3.0
   1    4.0
   2    9.0
   3   10.0
2  0    5.0
   1    6.0
   2   11.0
   3   12.0

Thank you for your help!

回答1:

Here is a candidate answer. I am still working to confirm its algorithmic efficiency. Please, comment if you have an opinion:

def linConcat(t1, t2):
    t = t1.reindex( index=t1.index.union(t2.index) )
    t.loc[t2.index,:] = t2
    return t
>>> linConcat(t1, t2)
          x
i1 i2      
0  0    1.0
   1    2.0
   2    7.0
   3    8.0
1  0    3.0
   1    4.0
   2    9.0
   3   10.0
2  0    5.0
   1    6.0
   2   11.0
   3   12.0