is there an elegant way to handle NA as 0 (na.rm = TRUE) in dplyr?
data <- data.frame(a=c(1,2,3,4), b=c(4,NA,5,6), c=c(7,8,9,NA))
data %>% mutate(sum = a + b + c)
a b c sum
1 4 7 12
2 NA 8 NA
3 5 9 17
4 6 NA NA
but I like to get
a b c sum
1 4 7 12
2 NA 8 10
3 5 9 17
4 6 NA 10
even if I know that this is not the desired result in many other cases
Try this
Resulting
data
isYou could use this:
Output:
Another option:
Benchmark
Or we can
replace
NA
with 0 and then use the OP's codeBased on the benchmarks using @Steven Beaupré data, it seems to be efficient as well.
Here's a similar approach to Steven's, but includes
dplyr::select()
to explicitly state which columns to include/ignore (like ID variables).It has comparable performance with a realistically-sized dataset. I'm not sure why though, since no columns are actually being excluded in this skinny example.
Bigger dataset of 1M rows:
Results: