Make new column which adds 30 days to date on ever

2019-08-07 10:17发布

I have a df with two variables, names and dates. I would like to create a new column (new_dates) which takes the first date belonging to each person (each person should have just one repeated date in this column) and add 30 days to each date as the rows descend.

Desired output is below. So row1 for each person holds the original date, row2 holds row1+30, row3 holds row2+30 and so on.

dff
   names      dates  new_dates
1   john 2010-06-01 2010-06-01
2   john 2010-06-01 2010-07-01
3   john 2010-06-01 2010-07-31
4   john 2010-06-01 2010-08-30
5   mary 2010-07-09 2010-07-09
6   mary 2010-07-09 2010-08-08
7   mary 2010-07-09 2010-09-07
8   mary 2010-07-09 2010-10-07
9    tom 2010-06-01 2010-06-01
10   tom 2010-06-01 2010-07-01
11   tom 2010-06-01 2010-07-31
12   tom 2010-06-01 2010-08-30

I thought I could use transform for this. Here is my attempt at it - but it doesn't quite work for me.

dt <- transform(df, new_date = c(dates[2]+30, NA))

标签: r transform rows
3条回答
神经病院院长
2楼-- · 2019-08-07 10:46

sorry, quickly read the question and didn't realize what you were doing at first.

definitely a brute-force method, and my programming is not, how you say, elegant, but it seems to give the desired result:

df <- psych::read.clipboard()

df <- data.frame(names = df$names,
                 dates = as.Date(df$dates))

library(lubridate)


tmp <- unlist(lapply(unique(df$names), function(x) {
                  tmp <- df[df$names == x, 2, drop = FALSE]
                  sapply(1:dim(tmp)[1], function(y) {
                    tmp[1, 1] + days(30) * (y - 1)
                    })
                } ))

df$new_dates <- as.Date(tmp, origin = '1970-01-01')

> df
   names      dates  new_dates
1   john 2010-06-01 2010-06-01
2   john 2010-06-01 2010-07-01
3   john 2010-06-01 2010-07-31
4   john 2010-06-01 2010-08-30
5   mary 2010-07-09 2010-07-09
6   mary 2010-07-09 2010-08-08
7   mary 2010-07-09 2010-09-07
8   mary 2010-07-09 2010-10-07
9    tom 2010-06-01 2010-06-01
10   tom 2010-06-01 2010-07-01
11   tom 2010-06-01 2010-07-31
12   tom 2010-06-01 2010-08-30
查看更多
可以哭但决不认输i
3楼-- · 2019-08-07 10:57

What you're exactly looking for is a bit confusing to me. I'm assuming that you're starting with a small data frame that looks like this:

> df <- data.frame(names=c("john","mary","tom"),dates=c(as.Date("2010-06-01"),as.Date("2010-07-09"),as.Date("2010-06-01")))
> df
  names      dates
1  john 2010-06-01
2  mary 2010-07-09
3   tom 2010-06-01

And then want to add N rows to your data frame that have your new dates column. If so, I'm sure there are some pre-packaged ways to do this but you could also use two nested lapply() calls. The inner most call would simply add a new column where newdates is set to be some multiple of 30 plus your original date and then the outer most call would be passing in your multiple of 30. For example:

> do.call(rbind,lapply(30*0:3,function(y) do.call(rbind,lapply(1:nrow(df),function(x) data.frame(names=df$names[x],dates=df$dates[x],newdates=df$dates[x]+y)))))
   names      dates   newdates
1   john 2010-06-01 2010-06-01
2   mary 2010-07-09 2010-07-09
3    tom 2010-06-01 2010-06-01
4   john 2010-06-01 2010-07-01
5   mary 2010-07-09 2010-08-08
6    tom 2010-06-01 2010-07-01
7   john 2010-06-01 2010-07-31
8   mary 2010-07-09 2010-09-07
9    tom 2010-06-01 2010-07-31
10  john 2010-06-01 2010-08-30
11  mary 2010-07-09 2010-10-07
12   tom 2010-06-01 2010-08-30

Anyway, this method isn't ideal and may be confusing so let me know if this is what you're looking for and I can provide more details about what is going on.

查看更多
【Aperson】
4楼-- · 2019-08-07 10:59

data.table makes this easy. Once you convert to a data table, it's basically one command. The main problem you're having with your version is that you need to split the data by name first, so you can get the minimum date for each person, and then add the appropriate mutiple of 30 days to each date.

library(data.table)
df$dates <- as.Date(df$dates)
dt <- as.data.table(df)
dt[, 
   list(dates, new_dates=min(dates) + 0:(length(dates) - 1L) * 30), 
   by=names
]
#     names      dates  new_dates
#  1:  john 2010-06-01 2010-06-01
#  2:  john 2010-06-01 2010-07-01
#  3:  john 2010-06-01 2010-07-31
#  4:  john 2010-06-01 2010-08-30
#  5:  mary 2010-07-09 2010-07-09
#  6:  mary 2010-07-09 2010-08-08
#  7:  mary 2010-07-09 2010-09-07
#  8:  mary 2010-07-09 2010-10-07
#  9:   tom 2010-06-01 2010-06-01
# 10:   tom 2010-06-01 2010-07-01
# 11:   tom 2010-06-01 2010-07-31
# 12:   tom 2010-06-01 2010-08-30

EDIT: here is a version that hopefully shows why yours didn't work. I still prefer data.table, but hopefully since this is basically very close to what you were doing it makes it clear what you need to change:

re_date <- function(df) {
  transform(
    df[order(df$dates), ], 
    new_dates=min(dates) + 30 * 0:(length(dates) - 1L)
) }
do.call(rbind, lapply(split(df, df$name), re_date))

Starting with the bottom line (do.call...), the split call makes a list with three data frames, one with the values for John, one for those for Mary, and one for those for Tom. The lapply then runs each of those data frames through the re_date function, which adds the new_dates column, and finally, the do.call/rbind stitches it back together into one data frame.

查看更多
登录 后发表回答