Python 2.7: shift a dataframe by day and a column

2019-08-20 13:40发布

I have a dataframe named df1 as following:

df1:

               a   b    id
2010-01-01     2   3    21
2010-01-01     2   4    22
2010-01-01     3   5    23
2010-01-01     4   6    24
2010-01-02     1   4    21
2010-01-02     2   5    22
2010-01-02     3   6    23
2010-01-02     4   7    24
2010-01-03     1   8    21
2010-01-03     2   9    22
2010-01-03     3   10    23
2010-01-03     4   11   24
...........................

I want to shift the value of a, b and id, the i rows value become the i+1 rows value. As you can see the df1, the same date have several rows, and the id is different. I want to shift the df1, I mean the 2010-01-02 value to be the 2010-01-03 value based on the id(I mean that 2010-01-02 value of id 21, to be the 2010-01-03 value of id 21). Thanks!

My desired answer:

                a   b    id
2010-01-01     Nan   Nan    Nan
2010-01-01     Nan   Nan    Nan
2010-01-01     Nan   Nan    Nan
2010-01-01     Nan   Nan    Nan
2010-01-02     2   3    21
2010-01-02     2   4    22
2010-01-02     3   5    23
2010-01-02     4   6    24
2010-01-03     1   4    21
2010-01-03     2   5    22
2010-01-03     3   6    23
2010-01-03     4   7    24
...........................

2条回答
Bombasti
2楼-- · 2019-08-20 14:17

If all groups are same length (in sample 4) and DatetimeIndex is sorted:

df2 = df.shift((df.index == df.index[0]).sum())
print (df2)
              a    b    id
2010-01-01  NaN  NaN   NaN
2010-01-01  NaN  NaN   NaN
2010-01-01  NaN  NaN   NaN
2010-01-01  NaN  NaN   NaN
2010-01-02  2.0  3.0  21.0
2010-01-02  2.0  4.0  22.0
2010-01-02  3.0  5.0  23.0
2010-01-02  4.0  6.0  24.0
2010-01-03  1.0  4.0  21.0
2010-01-03  2.0  5.0  22.0
2010-01-03  3.0  6.0  23.0
2010-01-03  4.0  7.0  24.0

But if need shift values of index by one day:

df3 = df.shift(1, freq='D')
print (df3)
            a   b  id
2010-01-02  2   3  21
2010-01-02  2   4  22
2010-01-02  3   5  23
2010-01-02  4   6  24
2010-01-03  1   4  21
2010-01-03  2   5  22
2010-01-03  3   6  23
2010-01-03  4   7  24
2010-01-04  1   8  21
2010-01-04  2   9  22
2010-01-04  3  10  23
2010-01-04  4  11  24
查看更多
等我变得足够好
3楼-- · 2019-08-20 14:26

One of the way is with the help of shape if the dates are sorted i.e

df.shift(df.loc[df.index[0]].shape[0])
# Or len 
df.shift(len(df.loc[df.index[0]]))

Output :

              a    b    id
2010-01-01  NaN  NaN   NaN
2010-01-01  NaN  NaN   NaN
2010-01-01  NaN  NaN   NaN
2010-01-01  NaN  NaN   NaN
2010-01-02  2.0  3.0  21.0
2010-01-02  2.0  4.0  22.0
2010-01-02  3.0  5.0  23.0
2010-01-02  4.0  6.0  24.0
2010-01-03  1.0  4.0  21.0
2010-01-03  2.0  5.0  22.0
2010-01-03  3.0  6.0  23.0
2010-01-03  4.0  7.0  24.0
查看更多
登录 后发表回答