Replace specific rows in data frame

2019-06-25 06:37发布

问题:

In the following data frame,

col1 <- c("g1","g2","g3",NA,"g4",NA)
col2 <- c(NA,"a1","a2",NA,"a3","a4")
df1 <-data.frame(col1,col2)

I would like to replace the rows with NA in col1 with corresponding rows of col2. Is it correct to proceed by extracting the rows containing NA by

row <- which(is.na(col1))

and then extract the characters from col2 by

extract <- df1$col2[row]

After this I have no clue how to replace the NAs in col1 with the extracted characters. Please help!

回答1:

You don't need which. Just is.na(df1$col1) would be sufficient that gives a logical index. The only problem with the dataset is that both the columns were factor class based on how you created the data.frame. It would be better to use stringsAsFactors=FALSE in the data.frame(..) as argument to get character columns. Otherwise, if the levels in col2 are not present in col1 while replacing, this will give warning message

# Warning message:
#In `[<-.factor`(`*tmp*`, is.na(df1$col1), value = c(1L, 2L, 3L,  :
#invalid factor level, NA generated

Here, I am converting the columns to character class before proceeding with the replacement to avoid the above warning.

df1[] <- lapply(df1, as.character)
indx <- is.na(df1$col1)
df1$col1[indx] <- df1$col2[indx]
df1
#  col1 col2
#1   g1 <NA>
#2   g2   a1
#3   g3   a2
#4 <NA> <NA>
#5   g4   a3
#6   a4   a4


标签: r dataframe rows