R conditional replace more columns by lookup

Lets say we do have lots of data columns (with names mycols and also some unnamed ones that should not be processed in this case) in dataframe df1 and a column subj which is also an index to another dataframe df2 with columns repl and subj (in this second dataframe is subj unique) and much other nonimportant columns (their only role in this is, that we cannot suppose that there are just 2 columns).

I would like to replace a subset of columns ( df1[,mycols] ) in such a way, that if there is an NA ( df1[,mycols][is.na(df1[,mycols])] ) <- replace by a value of column df2$repl where the row in df2 has df2$subj = df1$subj.

EDIT: example data (I dont know the command to write it as dataframe assignment):

mycols = c("a","b")
df1:
subj a  b  c
1    NA NA 1
1    2  3  5
2    0  NA 2
3    8  8  8
df2:
subj repl notinterested
1     5    1000
2     6    0
3     40   10
result:
df1-transformed-to:
subj a  b  c
1    5  5  1      #the 2 fives appeared by lookup
1    2  3  5
2    0  6  2     #the 6 appeared
3    8  8  8

I came up with the following code:

df1[,mycols][is.na(df1[,mycols])] <- df2[match( df1$subj, df2$subj),"repl"]

But the problem is (I think), that the right side is not the same size as the left side - I think it might work for one column in "mycols", but I want to do the same operation with all mycols (If NA, look to table df2 and replace - the replacing value is the same in the scope of the row).

(Also I need to enumerate the columns by names mycols explicitely everythime, because there might be another columns)

As a miniquestion as bonus about programming style - what is, in R, a good and a fast way to write this operation? If it would be a procedural language, we could transform

df1[,mycols][is.na(df1[,mycols])]

into an approach I consider more nice and more readable:

function(x){ *x[is.na(*x)] }
function(& df1[,mycols])

and being sure, that nothing gets unnecessarily copied from place to place.

标签： r dataframe lookup na

3条回答

放荡不羁爱自由

2楼-- · 2019-05-22 18:18

Using your code, we need to replicate the 'repl' column to make the two subset datasets equal and then assign the values as you did

 val <- df2$repl[match(df1$subj, df2$subj)][row(df1[mycols])][is.na(df1[mycols])]
 df1[mycols][is.na(df1[mycols])] <- val
 df1
 #  subj a b c
 #1    1 5 5 1
 #2    1 2 3 5
 #3    2 0 6 2
 #4    3 8 8 8

Another option using data.table

 library(data.table)#v1.9.5+
 DT <- setDT(df1, key='subj')[df2[c('subj', 'repl')]]
 for(j in mycols){
   i1 <- which(is.na(DT[[j]]))
   set(DT, i=i1, j=j, value= DT[['repl']][i1])
   }
 DT[,repl:= NULL]
 #   subj a b c
 #1:    1 5 5 1
 #2:    1 2 3 5
 #3:    2 0 6 2
 #4:    3 8 8 8

Or with dplyr

 library(dplyr)
 left_join(df1, df2, by='subj') %>%
        mutate_each_(funs(ifelse(is.na(.),repl,.)), mycols) %>% 
        select(a:c)
 #  a b c
 #1 5 5 1
 #2 2 3 5
 #3 0 6 2
 #4 8 8 8

data

 df1 <-  structure(list(subj = c(1L, 1L, 2L, 3L), a = c(NA, 2L, 0L, 8L 
 ), b = c(NA, 3L, NA, 8L), c = c(1L, 5L, 2L, 8L)), .Names = c("subj", 
 "a", "b", "c"), class = "data.frame", row.names = c(NA, -4L))

 df2 <- structure(list(subj = 1:3, repl = c(5L, 6L, 40L),
 notinterested = c(1000L, 
 0L, 10L)), .Names = c("subj", "repl", "notinterested"), 
 class = "data.frame", row.names = c(NA, -3L))

0人赞添加讨论(0) 举报

啃猪蹄的小仙女

3楼-- · 2019-05-22 18:30

Here's a possible solution using ifelse():

mycols <- c('a','b');
df1 <- data.frame(subj=c(1,1,2,3), a=c(NA,2,0,8), b=c(NA,3,NA,8), c=c(1,5,2,8) );
df2 <- data.frame(subj=c(1,2,3), repl=c(5,6,40), notinterested=c(1000,0,10) );
df1[mycols] <- ifelse(is.na(df1[mycols]),matrix(df2[match(df1$subj,df2$subj),'repl'],nrow(df1),length(mycols)),as.matrix(df1[mycols]));
df1;
##   subj a b c
## 1    1 5 5 1
## 2    1 2 3 5
## 3    2 0 6 2
## 4    3 8 8 8

0人赞添加讨论(0) 举报

老娘就宠你

4楼-- · 2019-05-22 18:42

One way of doing this with base R:

mycols = c("a","b")
df1 <- read.table(text="subj a  b  c
1    NA NA 1
1    2  3  5
2    0  NA 2
3    8  8  8", header = TRUE)
df2 <- read.table(text="subj repl notinterested
1     5    1000
2     6    0
3     40   10", header = TRUE)
df1[mycols] <- lapply(df1[mycols], function(x) {
  x[is.na(x)] <- df2$repl[match(df1$subj[is.na(x)], df2$subj)]; x})

0人赞添加讨论(0) 举报

R conditional replace more columns by lookup

data

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间