how to add new column based on the above row's

I have one dataframe as below. At first,they have three columns('date','time','flag'). I want to add one column which based on the flag and date which means when I get flag=1 ,then the rest of this day the target is 1, otherwise the target is zero.

  date        time      flag  target
0 2017/4/10   10:00:00  0     0
1 2017/4/10   11:00:00  1     1
2 2017/4/10   12:00:00  0     1
3 2017/4/10   13:00:00  0     1
4 2017/4/10   14:00:00  0     1
5 2017/4/11   10:00:00  1     1
6 2017/4/11   11:00:00  0     1
7 2017/4/11   12:00:00  1     1
8 2017/4/11   13:00:00  1     1
9 2017/4/11   14:00:00  0     1
10 2017/4/12  10:00:00  0     0
11 2017/4/12  11:00:00  0     0
12 2017/4/12  12:00:00  0     0
13 2017/4/12  13:00:00  0     0
14 2017/4/12  14:00:00  0     0
15 2017/4/13  10:00:00  0     0
16 2017/4/13  11:00:00  1     1
17 2017/4/13  12:00:00  0     1
18 2017/4/13  13:00:00  1     1
19 2017/4/13  14:00:00  0     1

标签： python pandas dataframe pandas-groupby

2条回答

Evening l夕情丶

2楼-- · 2019-05-06 07:18

Use DataFrameGroupBy.cumsum for cumulative sum flag values, compare with 0 and last cast mask to integer:

df['new'] = (df.groupby('date')['flag'].cumsum() > 0).astype(int)
print (df)
         date      time  flag  target  new
0   2017/4/10  10:00:00     0       0    0
1   2017/4/10  11:00:00     1       1    1
2   2017/4/10  12:00:00     0       1    1
3   2017/4/10  13:00:00     0       1    1
4   2017/4/10  14:00:00     0       1    1
5   2017/4/11  10:00:00     1       1    1
6   2017/4/11  11:00:00     0       1    1
7   2017/4/11  12:00:00     1       1    1
8   2017/4/11  13:00:00     1       1    1
9   2017/4/11  14:00:00     0       1    1
10  2017/4/12  10:00:00     0       0    0
11  2017/4/12  11:00:00     0       0    0
12  2017/4/12  12:00:00     0       0    0
13  2017/4/12  13:00:00     0       0    0
14  2017/4/12  14:00:00     0       0    0
15  2017/4/13  10:00:00     0       0    0
16  2017/4/13  11:00:00     1       1    1
17  2017/4/13  12:00:00     0       1    1
18  2017/4/13  13:00:00     1       1    1
19  2017/4/13  14:00:00     0       1    1

0人赞添加讨论(0) 举报

贼婆χ

3楼-- · 2019-05-06 07:33

Okay, I know that we've already found a solution here but just to satisfy the nerd in me, here's an answer (not elegant given how long it is) to avoid that nagging first-row flaw

pd.merge(df, (df.groupby('date')['flag'].any().astype(int)).to_frame().T.transpose().reset_index(), left_on='date', right_on='date')

Approach remains the same as @jezrael - the groupby function is key here. Instead of using the cumsum, which leads to the first-row flaw, any() appears to fit really well into this solution. The only drawback is that it produces a series, which we then need to coerce back into a dataframe and transpose before joining them together by the date key.

0人赞添加讨论(0) 举报

how to add new column based on the above row's

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间