I want to replace with zero the monthly values that are after a specific month by row. I have tried adapting Replace NA values in dataframe starting in varying columns without success. Given data:
df <- structure(list(Mth1 = c(1L, 3L, 4L, 1L, 2L),
Mth2 = c(2L, 3L, 2L, 2L, 2L),
Mth3 = c(1L, 2L, 1L, 2L, 3L),
Mth4 = c(3L, 1L, 3L, 4L, 2L),
ZeroMth = c(1L, 3L, 2L, 4L, 3L)),
.Names = c("Mth1", "Mth2", "Mth3", "Mth4", "ZeroMth"), class = "data.frame",
row.names = c("1", "2", "3", "4", "5"))
> df
Mth1 Mth2 Mth3 Mth4 ZeroMth
1 1 2 1 3 1
2 3 3 2 1 3
3 4 2 1 3 2
4 1 2 2 4 4
5 2 2 3 2 3
I would like to use the values in the ZeroMth column to specify the month where the replacements start. The desired output is:
> df1
Mth1 Mth2 Mth3 Mth4
1 0 0 0 0
2 3 3 0 0
3 4 0 0 0
4 1 2 2 0
5 2 2 0 0
Use apply
on each row (MARGIN = 1
) and replace
the values after the index specified in the last column to be zero
t(apply(X = df, MARGIN = 1, function(x)
replace(x = x, list = x[NCOL(df)]:(NCOL(df)-1), values = 0)))
# Mth1 Mth2 Mth3 Mth4 ZeroMth
#1 0 0 0 0 1
#2 3 3 0 0 3
#3 4 0 0 0 2
#4 1 2 2 0 4
#5 2 2 0 0 3
You could also use lapply
like this
setNames(data.frame(lapply(head(seq_along(df), -1), function(i) df[, i] * (i < df$ZeroMth))),
head(names(df), -1))
which returns
Mth1 Mth2 Mth3 Mth4
1 0 0 0 0
2 3 3 0 0
3 4 0 0 0
4 1 2 2 0
5 2 2 0 0
Here, you run through the locations of the month vectors and check if the element in the month is less than the designated zero month. If yes, the value is returned, otherwise it is 0. setNames
is used to restore the variable names.
Some benchmarks
After testing, changing lapply
to sapply
results in more than a 2X speedup. The major slowdown is due to the conversion to data.frame.
This led me to check a bit further. Here are microbenchmark results.
microbenchmark(
db.mat=t(apply(X = df, MARGIN = 1, function(x)
replace(x = x, list = x[NCOL(df)]:(NCOL(df)-1), values = 0))),
db.df=data.frame(t(apply(X = df, MARGIN = 1, function(x)
replace(x = x, list = x[NCOL(df)]:(NCOL(df)-1), values = 0)))),
lmo.list=setNames(lapply(head(seq_along(df), -1),
function(i) df[, i] * (i < df$ZeroMth)),
head(names(df), -1)),
lmo.dfl=setNames(data.frame(lapply(head(seq_along(df), -1),
function(i) df[, i] * (i < df$ZeroMth))),
head(names(df), -1)),
lmo.dfs=setNames(data.frame(sapply(head(seq_along(df), -1),
function(i) df[, i] * (i < df$ZeroMth))),
head(names(df), -1)),
lmo.listAlt=setNames(lapply(head(seq_along(df), -1),
function(i) {temp <- df[, i]; temp[i < df$ZeroMth] <- 0; temp}),
head(names(df), -1)),
lmo.dflAlt=setNames(data.frame(lapply(head(seq_along(df), -1),
function(i) {temp <- df[, i]; temp[i < df$ZeroMth] <- 0; temp})),
head(names(df), -1)),
lmo.dfsAlt=setNames(data.frame(sapply(head(seq_along(df), -1),
function(i) {temp <- df[, i]; temp[i < df$ZeroMth] <- 0; temp})),
head(names(df), -1)))
Unit: microseconds
expr min lq mean median uq max neval cld
df.mat 135.994 155.2380 161.2480 159.6570 166.785 196.436 100 b
db.df 225.231 236.9190 248.3295 246.0430 256.164 340.411 100 c
lmo.list 84.960 99.5005 105.8299 104.9175 110.905 156.806 100 a
lmo.dfl 439.057 459.1565 480.3425 476.5475 492.656 647.751 100 d
lmo.dfs 173.057 187.3120 217.2876 195.8650 202.850 2257.151 100 c
lmo.listAlt 91.803 108.0535 114.6253 113.1860 118.602 185.602 100 ab
lmo.dflAlt 458.158 481.2520 521.6052 498.2155 516.462 2584.163 100 d
lmo.dfsAlt 181.610 198.4310 221.5613 204.2755 212.686 1611.395 100 c
Wow, lapply
with data.frame
is super slow.
We can also make this compact by
(col(df[-5]) <df$ZeroMth[row(df[-5])])*df[-5]
# Mth1 Mth2 Mth3 Mth4
#1 0 0 0 0
#2 3 3 0 0
#3 4 0 0 0
#4 1 2 2 0
#5 2 2 0 0