I want to replace with zero the monthly values that are after a specific month by row. I have tried adapting Replace NA values in dataframe starting in varying columns without success. Given data:
df <- structure(list(Mth1 = c(1L, 3L, 4L, 1L, 2L),
Mth2 = c(2L, 3L, 2L, 2L, 2L),
Mth3 = c(1L, 2L, 1L, 2L, 3L),
Mth4 = c(3L, 1L, 3L, 4L, 2L),
ZeroMth = c(1L, 3L, 2L, 4L, 3L)),
.Names = c("Mth1", "Mth2", "Mth3", "Mth4", "ZeroMth"), class = "data.frame",
row.names = c("1", "2", "3", "4", "5"))
> df
Mth1 Mth2 Mth3 Mth4 ZeroMth
1 1 2 1 3 1
2 3 3 2 1 3
3 4 2 1 3 2
4 1 2 2 4 4
5 2 2 3 2 3
I would like to use the values in the ZeroMth column to specify the month where the replacements start. The desired output is:
> df1
Mth1 Mth2 Mth3 Mth4
1 0 0 0 0
2 3 3 0 0
3 4 0 0 0
4 1 2 2 0
5 2 2 0 0
You could also use
lapply
like thiswhich returns
Here, you run through the locations of the month vectors and check if the element in the month is less than the designated zero month. If yes, the value is returned, otherwise it is 0.
setNames
is used to restore the variable names.Some benchmarks
After testing, changing
lapply
tosapply
results in more than a 2X speedup. The major slowdown is due to the conversion to data.frame.This led me to check a bit further. Here are microbenchmark results.
Wow,
lapply
withdata.frame
is super slow.We can also make this compact by
Use
apply
on each row (MARGIN = 1
) andreplace
the values after the index specified in the last column to be zero