Using ifelse Within apply

2020-07-08 07:44发布

问题:

I am trying to make a new column in my dataset give a single output for each and every row, depending on the inputs from pre-existing columns.

In this output column, I desire "NA" if any of the input vales in a given row are "0". Otherwise (if none of the inputs are 0), I want the output for that row to be the number of unique values of the inputs.

I thought that the solution would use an ifelse function nested within an apply function, but I get an error that I do not understand.

data$output <- apply(data, 1, function(x) {ifelse(x == 0, NA, length(unique(x)))})

Error in $<-.data.frame(*tmp*, "output", value = c(3L, 3L, 3L, 3L, : replacement has 3 rows, data has 4

I do not know why the replacement has 3 rows, as I thought apply just does the same function to each of my 4 rows.

回答1:

You want to check if any of the variables in a row are 0, so you need to use any(x==0) instead of x == 0 in the ifelse statement:

apply(data, 1, function(x) {ifelse(any(x == 0), NA, length(unique(x)))})
# [1]  1 NA  2

Basically ifelse returns a vector of length n if its first argument is of length n. You want one value per row, but are passing more than one with x==0 (the number of values you're passing is equal to the number of columns in your data frame).

Data:

(data <- data.frame(a=c(1, 2, 3), b=c(1, 0, 1)))
#   a b
# 1 1 1
# 2 2 0
# 3 3 1


回答2:

Let n = length(x). ifelse will return rep(NA, n) if TRUE otherwise rep(length(unique(x)), n). Therefore apply will output a matrix. data$output <- apply(... tries assign a matrix (your result) into a column in your data.frame, data$output. This is the cause of your error.

Your code will run if you just assign your output to a variable instead

out <- apply(data, 1, function(x) {ifelse(x == 0, NA, length(unique(x)))})

If you are not expecting a class(matrix) as your output, but rather a vector, then there is something wrong with the logic of your function.