我有多个年份的值列中的数据帧。 岁月可以不遵循的顺序,可能有失踪5年。 下面是一个例子数据帧
df = data.frame(code = c("AFG", "AGO", "ALB", "AND", "ARB", "ARE", "ARG", "ARM", "ASM", "ATG", "AUS", "AUT", "AUT", "AUT", "AUT", "ABW", "AFG", "AGO", "ALB", "AND", "ARB", "ARE", "ARG", "ARM", "ARM"),
PPT = c(123, 42, 23, 5, 42, 4, 23, 25, 42, 23, NA, 5563, 56, 54, 645, 6, 4,53, 656, 65, 5563, 646, 6, 66, 54),
Year = c(1990, 1991, 1992, 1993, 1991, 1995, 1996, 1997, 1991, 1992, 2000, 2001, 2002, 2014, 2004, 2005, 2006, 2007, 1960, 2009, NA, 2011, 2012, 2013, 2014))
我想补充一点,将根据今年5 +该年值之间的差异的附加列。 防爆。 如果在今年列的第一年是1960年,但没有PPT数据可用于1965年,因此在new_col的价值将是NA。 同样,对于1990年的new_col值是119(123-4),NA为2000年(无PPT可用于2005年数据),19为1991和-2 1992年等年份。
我有在Excel这样做的一个非常令人费解的方式,但是,我要寻找R中一个简单的解决方案
Answer 1:
我们可以arrange
由“年”,并采取“PPT”的区别与lead
,其中“N”被指定为5“PPT”的
library(dplyr)
df %>%
arrange(Year) %>%
mutate(newcol = PPT - lead(PPT, n = 5, default = 0))
# code PPT Year newcol
#1 AFG 123 1990 119
#2 AGO 42 1991 19
#3 ALB 23 1992 -2
#4 AND 5 1993 -1
#5 ARB 23 1994 -611
#6 ARE 4 1995 -1
#7 ARG 23 1996 -5540
#8 ARM 25 1997 -31
#9 ASM 6 1998 -50
#10 ATG 634 1999 -11
#...
如果一些“新年的缺失,我们可以扩大与数据complete
,然后执行mutate
library(tidyr)
df %>%
arrange(Year) %>%
complete(Year = min(Year):max(Year)) %>%
mutate(newcol = PPT - lead(PPT, n = 5, default = 0)) %>%
filter(!is.na(PPT))
或使用base R
df$newcol <- with(df, c(head(PPT, -5) - tail(PPT, -5), tail(PPT, 5)))
数据
df <- structure(list(code = structure(c(2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 12L, 13L, 13L, 13L, 13L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 9L), .Label = c("ABW", "AFG", "AGO", "ALB", "AND",
"ARB", "ARE", "ARG", "ARM", "ASM", "ATG", "AUS", "AUT"), class = "factor"),
PPT = c(123, 42, 23, 5, 23, 4, 23, 25, 6, 634, 5, 5563, 56,
56, 645, 6, 4, 656, 645, 65, 5563, 646, 6, 66, 54),
Year = 1990:2014), class = "data.frame", row.names = c(NA,
-25L))
Answer 2:
一个data.table解决方案,将有下落不明/跳空年工作...
样本数据
df = data.frame(code = c("AFG", "AGO", "ALB", "AND", "ARB", "ARE", "ARG", "ARM", "ASM", "ATG", "AUS", "AUT", "AUT", "AUT", "AUT", "ABW", "AFG", "AGO", "ALB", "AND", "ARB", "ARE", "ARG", "ARM", "ARM"),
PPT = c(123, 42, 23, 5, 23, 4, 23, 25, 6, 634, 5, 5563, 56, 56, 645, 6, 4, 656, 645, 65, 5563, 646, 6, 66, 54),
Year = c(1990:2014))
码
library(data.table)
#create a data.table with all years from minimum untill maximum + 5
#so missing years will get a NA!
#perform a by-reference join on these years, by Year
result <- data.table( Year = min(df$Year):(max(df$Year) + 5) )[setDT(df), `:=`(code = i.code, PPT = i.PPT), on = .(Year)]
#calculate the desired column, delete unwanted rows
result[, newcol := PPT - shift(PPT, 5, type = "lead" )][!is.na(code),][]
产量
# Year code PPT newcol
# 1: 1990 AFG 123 119
# 2: 1991 AGO 42 19
# 3: 1992 ALB 23 -2
# 4: 1993 AND 5 -1
# 5: 1994 ARB 23 -611
# 6: 1995 ARE 4 -1
# 7: 1996 ARG 23 -5540
# 8: 1997 ARM 25 -31
# 9: 1998 ASM 6 -50
# 10: 1999 ATG 634 -11
# 11: 2000 AUS 5 -1
# 12: 2001 AUT 5563 5559
# 13: 2002 AUT 56 -600
# 14: 2003 AUT 56 -589
# 15: 2004 AUT 645 580
# 16: 2005 ABW 6 -5557
# 17: 2006 AFG 4 -642
# 18: 2007 AGO 656 650
# 19: 2008 ALB 645 579
# 20: 2009 AND 65 11
# 21: 2010 ARB 5563 NA
# 22: 2011 ARE 646 NA
# 23: 2012 ARG 6 NA
# 24: 2013 ARM 66 NA
# 25: 2014 ARM 54 NA
# Year code PPT newcol
Answer 3:
我们也可以使用mapply
df$new_col <- mapply(function(x, y) {
inds = df$Year == y + 5
if (any(inds)) x - df$PPT[inds] else x
},df$PPT, df$Year)
df
# code PPT Year new_col
#1 AFG 123 1990 119
#2 AGO 42 1991 19
#3 ALB 23 1992 -2
#4 AND 5 1993 -1
#5 ARB 23 1994 -611
#6 ARE 4 1995 -1
#7 ARG 23 1996 -5540
#8 ARM 25 1997 -31
#9 ASM 6 1998 -50
#10 ATG 634 1999 -11
#.....
文章来源: Arithmetic operation based on value from another column