replace row values conditional on is.na in another

2020-05-01 09:47发布

问题:

Simple logical replace problem; I have a dataframe like:

mydf <- expand.grid(var1 = c('type1', 'type2'), var2 = c(7, 6, "NA"), var3 = 9)

I would like to replace the values in var3 with the values in var2 unless var2 is NA. So the resulting new var3 should be 7,7,6,6,NA,NA. In trying to get at this, I notice that

mydf$var3[mydf$var2 == 7] <- 5

correctly identifies rows 1 and 2 of mydf as needing replacement, and leaves the last four rows alone, so I get var3 = 5,5,9,9,9,9. However if I try

    mydf$var3[!is.na(mydf$var2)] <- 5

I get var3 = 5,5,5,5,5,5. So why didn't it skip the last two rows, where var2 was NA? Next problem is that don't know how to get the replacement values to be var2 instead of a constant. When I try

mydf$var3[!is.na(mydf$var2)] <- mydf$var2

I get var3 = 1,1,2,2,3,3. Which I do not understand at all.

回答1:

As in the comments, the string "NA" is not an NA value. So is.na("NA") is FALSE and all rows are selected. Just replace "NA" in your definition with NA.

mydf <- expand.grid(var1 = c('type1', 'type2'), var2 = c(7, 6, NA), var3 = 9)
mydf$var3[!is.na(mydf$var2)] <- mydf$var2[!is.na(mydf$var2)]

Note that you can't just replace the left hand side with just mydf$var2 because they now have unequal lengths - before you didn't have this error since nothing was NA.