I have a datframe with around 200 features and 3000 rows. These data samples are logged in different time, basically one per month, as shown in the below example in “col101”:
0 col1 (id) col2. col3 …. col100 col101 (date) … col2000 (target value)
1 001 653. 675 …. 343.3 01-02-2017. … 1
2 001 673. 432 …. 387.3 01-03-2017. … 0
3 001 679. 528 …. 401.2 01-04-2017. … 1
4 001 685 223 …. 503.4 01-05-2017. … 1
5 002 343 428 …. 432.5 01-02-2017. … 0
6 002 479. 421 …. 455.3 01-03-2017. … 0
7 … … … …. … …. … ..
Within these features some of are cumulative data so that in every month their values have been increased. For example, col2 and col100 are the cumulative features in my dataframe. So I want to add one more column for each cumulative feature, with the difference with respect to the previous month. So my desired dataframe should be something like this:
0 col1 (id) col2. col2c …. col100 col100c col101 (date) … col2000 (targeva)
1 001 653. 653 …. 343.3 343.3 01-02-2017. … 1
2 001 673. 23 …. 387.3 44 01-03-2017. … 0
3 001 679. 6 …. 401.2 13.9 01-04-2017. … 1
4 001 685 6 …. 503.4 102.2 01-05-2017. … 1
5 002 343 343 …. 432.5 432.5 01-02-2017. … 0
6 002 479. 136 …. 455.3 23.2 01-03-2017. … 0
7 … … … …. … …. … ..
Now, I have two problems here: 1) how can I automatically recognize those cumulative features with 200 features? and how to add that extra feature (e.g., col22c and col100c) for each cumulative attribute? Does anyone know how I can handle this?