Vectorized solution to conditional dataframe selec

2019-08-13 03:52发布

问题:

I recently asked a question which was answered - How do I add conditionally to a selection of cells in a pandas dataframe column when the the column is a series of lists?, but I believe have a new problem which I had not previously considered.

In the following dataframe I need two conditions to result in a change to column d. Each value in column d is a list.

  • Where a == b, the final integer in d is incremented by one.
  • Where a != b, the list of integers is extended and the value 1 is inserted at the end of the list in column d.

    a       b       c           d           
    On      On      [0]         [0,3]       
    On      Off     [0]         [0,1]
    On      On      [0]         [2]         
    On      On      [0]         [0,4,4]         
    On      Off     [0]         [0]
    
  • As a result, the dataframe would like this:

    a       b       c       d       
    On      On      [0]     [0,4]       
    On      Off     [0]     [0,1,1]     
    On      On      [0]     [3]
    On      On      [0]     [0,4,5] 
    On      Off     [0]     [0,1]
    

I realise that this can be done using pd.Series.apply method in conjunction with a predefined function or use of lambda however the data frame consists of 100000 rows and I was hoping that a vectorized solution to these two conditions may exist.

回答1:

As Edchum says, vecorised solution can be problematic.

One non vectorized solution with apply custom functions:

df['e'] = df['d']

def exten(lst):
    return lst + [1]

def incre(lst):
    lst[-1] = lst[-1] + 1
    return lst

df.loc[df.a != df.b, 'd'] = df.e.apply(exten)
df.loc[df.a == df.b, 'd'] = df.e.apply(incre)
df = df.drop('e', axis=1)
print df
    a    b    c          d
0  On   On  [0]     [0, 4]
1  On  Off  [0]  [0, 1, 1]
2  On   On  [0]        [3]
3  On   On  [0]  [0, 4, 5]
4  On  Off  [0]     [0, 1]