How to simplify a leading-NA count function, and g

2019-08-05 11:28发布

问题:

I wrote a leading-NA count function, it works on vectors. However:

a) Can you simplify my version?

b) Can you also generalize it to work directly on matrix, dataframe (must still work on individual vector), so I don't need apply()? Try to avoid all *apply functions, fully vectorize, it must still work on a vector, and no special-casing if at all possible.

leading_NA_count <- function(x) { max(cumsum((1:length(x)) == cumsum(is.na(x)))) }
# v0.1: works but seems clunky, slow and unlikely to be generalizable to two-dimensional objects

leading_NA_count <- function(x) { max(which(1:(length(x)) == cumsum(is.na(x))), 0) }
# v0.2: maybe simpler, needs max(...,0) to avoid max failing with -Inf if the which(...) is empty/ no leading-NAs case: e.g. c(1,2,3) 

# (Seems impossible to figure out how to use which.max/which.min on this)


leading_NA_count <- function(x) { max(cumsum((1:length(x)) == cumsum(is.na(x)))) }
set.seed(1234)
mm <- matrix(sample(c(NA,NA,NA,NA,NA,0,1,2), 6*5, replace=T), nrow=6,ncol=5)
mm
     [,1] [,2] [,3] [,4] [,5]
[1,]   NA   NA   NA   NA   NA
[2,]   NA   NA    2   NA    1
[3,]   NA    0   NA   NA   NA
[4,]   NA   NA    1   NA    2
[5,]    1    0   NA   NA    1
[6,]    0   NA   NA   NA   NA

leading_NA_count(mm)
[1] 4 # WRONG, obviously (looks like it tried to operate on the entire matrix by-column or by-row)
apply(mm,1,leading_NA_count)
[1] 5 2 1 2 0 0 # RIGHT

回答1:

This works whether mm is a matrix, vector or data.frame. See ?max.col for more info:

max.col(cbind(!is.na(rbind(NA, mm)), TRUE), ties = "first")[-1] - 1

回答2:

For part (a) of your question this is the simplest function I could think of:

leadingNaCount = function(x) { sum(cumprod(is.na(x))) }