Pandas: adding multiindex Series/Dataframes contai

2019-08-31 05:15发布

问题:

How do I add / merge two multiindex Series/DataFrames which contain lists as elements (a port-sequence or timestamp-sequence in my case). Especially, how to deal with indices, which appear only in one Series/DataFrame? Unfortunately, the .add()-method allows only floats for the fill_value argument, not empty lists.

My Data:

print series1
print series2

IP               sessionID
195.12*.21*.11*  49                    [5900]
                 50         [5900, 5900, 5900, 5900, ...

IP               sessionID
85.15*.24*.12*   63                    [3389]
91.20*.4*.14*    68           [445, 445, 139]
113.9*.4*.16*    75                 [23, 210]
195.12*.21*.11*  49                    [5905]

Expected result:

IP               sessionID
195.12*.21*.11*  49              [5900, 5905]
                 50         [5900, 5900, 5900, 5900, ...
85.15*.24*.12*   63                    [3389]
91.20*.4*.14*    68           [445, 445, 139]
113.9*.4*.16*    75                 [23, 210]

Oddly enough, series1.add(series1) or series2.add(series2) does work and appends the lists as expected, however series1.add(series2) produces runtime errors. series1.combine_first(series2) works, however it does not merge the lists - it simply takes one. Any ideas?

Yes, I know that lists as elements are bad style, but that's the way my data is right now. Sorry for that. To keep it short I just have posted the series example, let me know if you also need the DataFrame example.

回答1:

In case there is any other poor ghost out there which needs this info... It seems like a dirty work-around, but it works:

# add() works for mutual indices, so find intersection and call it
# fortunately, it appends list2 to list1!
intersection = series1.index.intersection(series2.index)
inter1 = series1[series1.index.isin(intersection)]
inter2 = series2[series2.index.isin(intersection)]
interAppend = inter1.add(inter2)

# combine_first() unions indices and keeps the values of the caller,
# so it will keep the appended lists on mutual indices,
# while it adds new indices and corresponding values
exclusiveAdd = interAppend.combine_first(series1).combine_first(series2)