repeat rows in a dataset based on a column, but in

2019-07-20 20:46发布

I have a dataset which has project name, start year and contract term. I need to develop this dataset into time series. For example, one row in my dataset is: Project A, start year 2003 and contract term 5. I would like to repeat each row based on contract term. My dataset looks like this:

Project Name    Start Year    Contract Term
A                 2003            5
B                 2013            3
C                 2000            2

My desired result should look like this:

Project Name    Start Year    Contract Term
A                 2003            5
A                 2004            5
A                 2005            5
A                 2006            5
A                 2007            5

B                 2013            3
B                 2014            3
B                 2014            3

C                 2000            2
C                 2001            2

I have tried:

rpsData <- rpsInput[rep(rownames(rpsInput), rpsInput$Contract.Term), ]

But this only repeats each project by the number in contract term. I can not make it to increment the years.

Thanks in advance!

标签: r dataframe rep
2条回答
乱世女痞
2楼-- · 2019-07-20 20:54

Here it is in two steps:

Step 1, you know:

rpsData <- rpsInput[rep(rownames(rpsInput), rpsInput$Contract.Term), ]
rpsData
#     Project.Name Start.Year Contract.Term
# 1              A       2003             5
# 1.1            A       2003             5
# 1.2            A       2003             5
# 1.3            A       2003             5
# 1.4            A       2003             5
# 2              B       2013             3
# 2.1            B       2013             3
# 2.2            B       2013             3
# 3              C       2000             2
# 3.1            C       2000             2

Step 2 makes use of sequence and basic addition:

sequence(rpsInput$Contract.Term) ## This will be helpful...
#  [1] 1 2 3 4 5 1 2 3 1 2

rpsData$Start.Year <- rpsData$Start.Year + sequence(rpsInput$Contract.Term)
rpsData
#     Project.Name Start.Year Contract.Term
# 1              A       2004             5
# 1.1            A       2005             5
# 1.2            A       2006             5
# 1.3            A       2007             5
# 1.4            A       2008             5
# 2              B       2014             3
# 2.1            B       2015             3
# 2.2            B       2016             3
# 3              C       2001             2
# 3.1            C       2002             2
查看更多
小情绪 Triste *
3楼-- · 2019-07-20 21:10

Just to piggy back on Ananda's answer, change

sequence(rpsInput$Contract.Term)

to

(sequence(rpsInput$Contract.Term)-1)

to get the output you desire.

ProjectName<-c("A","B","C")
Start.Year<-c(2003,2013,2000)
Contract.Term<-c(5,3,2)
rpsInput<-data.frame(ProjectName,Start.Year,Contract.Term)
rpsData <- rpsInput[rep(rownames(rpsInput), rpsInput$Contract.Term), ]
rpsData$Start.Year <- rpsData$Start.Year + (sequence(rpsInput$Contract.Term)-1)
rpsData
#    ProjectName Start.Year Contract.Term
#1             A       2003             5
#1.1           A       2004             5
#1.2           A       2005             5
#1.3           A       2006             5
#1.4           A       2007             5
#2             B       2013             3
#2.1           B       2014             3
#2.2           B       2015             3
#3             C       2000             2
#3.1           C       2001             2
查看更多
登录 后发表回答