pandas python flag transactions across rows

2019-08-06 09:59发布

问题:

I have a data as below. I would like to flag transactions -

when a same employee has one of the ('Car Rental', 'Car Rental - Gas' in the column expense type) and 'Car Mileage' on the same day - so in this case employee a and c's transactions would be highlighted. Employee b's transactions won't be highlighted as they don't meet the condition - he doesn't have a 'Car Mileage'

i want the column zflag. Different numbers in that column indicate group of instances when the above condition was met

d = {'emp': ['a',   'a',    'a',    'a',    'b',    'b',    'b',    'c',    'c',    'c',    'c' ], 
 'date': ['1',  '1',    '1',    '1',    '2',    '2',    '2',    '3',    '3',    '3',    '3' ], 
 'usd':[1,  2,  3,  4,  5,  6,  7,  8,  9,  10,     11 ], 
 'expense type':['Car Mileage',     'Car Rental',   'Car Rental - Gas',     'food',     'Car Rental',   'Car Rental - Gas',     'food',     'Car Mileage',  'Car Rental',   'food',     'wine' ],
 'zflag':['1',  '1', '1',   ' ',' ',' ',' ','2','2',' ',' ' ]
 }

df = pd.DataFrame(data=d)



    df
Out[253]: 
   date emp      expense type  usd zflag
0     1   a       Car Mileage    1     1
1     1   a        Car Rental    2     1
2     1   a  Car Rental - Gas    3     1
3     1   a              food    4      
4     2   b        Car Rental    5      
5     2   b  Car Rental - Gas    6      
6     2   b              food    7      
7     3   c       Car Mileage    8     2
8     3   c        Car Rental    9     2
9     3   c              food   10      
10    3   c              wine   11      

I would appreciate if i could get pointers regarding functions to use. I am thinking of using groupby...but not sure

I understand that date+emp will be my primary key

回答1:

Here is an approach. It's not the cleanest but what you're describing is quite specific. Some of this might be able to be simplified with a function.

temp_df = df.groupby(["emp", "date"], axis=0)["expense type"].apply(lambda x: 1 if "Car Mileage" in x.values and any([k in x.values for k in ["Car Rental", "Car Rental - Gas"]]) else 0).rename("zzflag")
temp_df = temp_df.loc[temp_df!=0,:].cumsum()
final_df = pd.merge(df, temp_df.reset_index(), how="left").fillna(0)

Steps:

  • Groupby emp/date and search for criteria, 1 if met, 0 if not

  • Remove rows with 0's and cumsum to produce unique values

  • Join back to the original frame

Edit:

To answer your question below. The join works because after you run .reset_index() that takes "emp" and "date" from the index and moves them to columns.