How to apply a set of functions to each group of a

2019-05-24 10:54发布

I need to reshape data.frame in R in one step. In short, change of values of objects (x1 to x6) is visible row by row (from 1990 to 1995):

> tab1[1:10, ] # raw data see plot for tab1
   id value year
1  x1     7 1990
2  x1    10 1991
3  x1    11 1992
4  x1     7 1993
5  x1     3 1994
6  x1     1 1995
7  x2     6 1990
8  x2     7 1991
9  x2     9 1992
10 x2     5 1993

I am able to do reshaping step by step, does anybody know how do it in one step?

Original data Table 1 - see that minimal value from all timeseries is "0"

Step1: Table 2 - rescale each timeseries that each would have minimal value equal "0". All times fall down on x-axes.

Step2: Table 3 - apply diff() function on each timeline.

Step3: Table 4 - apply sort() function on each timeseries.

I hope the pictures are clear enough for understanding each step.

So final table looks like this:

> tab4[1:10, ]
   id value time
1  x1    -4    1
2  x1    -4    2
3  x1    -2    3
4  x1     1    4
5  x1     3    5
6  x2    -4    1
7  x2    -3    2
8  x2     1    3
9  x2     1    4
10 x2     2    5

enter image description here

# Source data:
tab1 <- data.frame(id = rep(c("x1","x2","x3","x4","x5","x6"), each = 6),
                   value = c(7,10,11,7,3,1,6,7,9,5,2,3,11,9,7,9,1,
                             0,1,2,2,4,7,4,2,3,1,6,4,2,3,5,4,3,5,6),
                   year = rep(c(1990:1995), times = 6))

tab2 <- data.frame(id = rep(c("x1","x2","x3","x4","x5","x6"), each = 6),
                   value = c(6,9,10,6,2,0,4,5,7,3,0,1,11,9,7,9,1,0,
                             0,1,1,3,6,3,1,2,0,5,3,1,0,2,1,0,2,3),
                   year = rep(c(1990:1995), times = 6))

tab3 <- data.frame(id = rep(c("x1","x2","x3","x4","x5","x6"), each = 5),
                   value = c(3,1,-4,-4,-2,1,2,-4,-3,1,-2,-2,2,-8,-1,
                             1,0,2,3,-3,1,-2,5,-2,-2,2,-1,-1,2,1),
                   time = rep(c(1:5), times = 6))

tab4 <- data.frame(id = rep(c("x1","x2","x3","x4","x5","x6"), each = 5),
                   value = c(-4,-4,-2,1,3,-4,-3,1,1,2,-8,-2,-2,-1,2,
                             -3,0,1,2,3,-2,-2,-2,1,5,-1,-1,1,2,2),
                   time = rep(c(1:5), times = 6))

2条回答
Lonely孤独者°
2楼-- · 2019-05-24 11:11

Using data.table, this is simply:

require(data.table) ## 1.9.2
ans <- setDT(tab1)[, list(value=diff(value)), by=id]  ## aggregation
setkey(ans, id,value)[, time := seq_len(.N), by=id] ## order + add 'time' column

Note that your 'step 1' is unnecessary as your second step is calculating difference and it wouldn't have any effect (and is therefore skipped here).

查看更多
何必那么认真
3楼-- · 2019-05-24 11:22

It sounds like you want to apply a set of functions to each group of a grouping variable. There are many ways to do this in R (from base R by and tapply to add-on packages like plyr, data.table, and dplyr). I've been learning how to use package dplyr, and came up with the following solution.

require(dplyr)

tab4 = tab1 %>%
    group_by(id) %>% # group by id
    mutate(value = value - min(value), value = value - lag(value)) %>% # group min to 0, difference lag 1
    na.omit %>% # remove NA caused by lag 1 differencing
    arrange(id, value) %>% # order by value within each id
    mutate(time = 1:length(value)) %>% # Make a time variable from 1 to 5 based on current order
    select(-year) # remove year column to match final OP output
查看更多
登录 后发表回答