Calculating numeric differences per group in panda

2019-08-12 16:20发布

My Dataframe has the following structure:

patient_id  |  timestamp  |  measurement
A           |  2014-10-10 |  5.7
A           |  2014-10-11 |  6.3
B           |  2014-10-11 |  6.1
B           |  2014-10-10 |  4.1

I would like to calculate a delta (difference) between each measurement of each patient.

The result should look like:

patient_id  |  timestamp  |  measurement  |    delta
A           |  2014-10-10 |  5.7          |     NaN
A           |  2014-10-11 |  6.3          |     0.6
B           |  2014-10-11 |  6.1          |     2.0
B           |  2014-10-10 |  4.1          |     NaN

How can this be done most-elegantly in pandas ?

1条回答
做自己的国王
2楼-- · 2019-08-12 16:50

Call transform on the 'measurement' column and pass the method diff, transform returns a series with an index aligned to the original df:

In [4]:

df['delta'] = df.groupby('patient_id')['measurement'].transform(pd.Series.diff)
df
Out[4]:
  patient_id   timestamp  measurement  delta
0          A  2014-10-10          5.7    NaN
1          A  2014-10-11          6.3    0.6
2          B  2014-10-10          4.1    NaN
3          B  2014-10-11          6.1    2.0

EDIT

If you are intending to apply some sorting on the result of transform then sort the df first:

In [10]:

df['delta'] = df.sort(columns=['patient_id', 'timestamp']).groupby('patient_id')['measurement'].transform(pd.Series.diff)
df
Out[10]:
  patient_id   timestamp  measurement  delta
0          A  2014-10-10          5.7    NaN
1          A  2014-10-11          6.3    0.6
2          B  2014-10-11          6.1    2.0
3          B  2014-10-10          4.1    NaN
查看更多
登录 后发表回答