Lets say we do have lots of data columns (with names mycols and also some unnamed ones that should not be processed in this case) in dataframe df1 and a column subj which is also an index to another dataframe df2 with columns repl and subj (in this second dataframe is subj unique) and much other nonimportant columns (their only role in this is, that we cannot suppose that there are just 2 columns).
I would like to replace a subset of columns ( df1[,mycols] ) in such a way, that if there is an NA ( df1[,mycols][is.na(df1[,mycols])] ) <- replace by a value of column df2$repl where the row in df2 has df2$subj = df1$subj.
EDIT: example data (I dont know the command to write it as dataframe assignment):
mycols = c("a","b")
df1:
subj a b c
1 NA NA 1
1 2 3 5
2 0 NA 2
3 8 8 8
df2:
subj repl notinterested
1 5 1000
2 6 0
3 40 10
result:
df1-transformed-to:
subj a b c
1 5 5 1 #the 2 fives appeared by lookup
1 2 3 5
2 0 6 2 #the 6 appeared
3 8 8 8
I came up with the following code:
df1[,mycols][is.na(df1[,mycols])] <- df2[match( df1$subj, df2$subj),"repl"]
But the problem is (I think), that the right side is not the same size as the left side - I think it might work for one column in "mycols", but I want to do the same operation with all mycols (If NA, look to table df2 and replace - the replacing value is the same in the scope of the row).
(Also I need to enumerate the columns by names mycols explicitely everythime, because there might be another columns)
As a miniquestion as bonus about programming style - what is, in R, a good and a fast way to write this operation? If it would be a procedural language, we could transform
df1[,mycols][is.na(df1[,mycols])]
into an approach I consider more nice and more readable:
function(x){ *x[is.na(*x)] }
function(& df1[,mycols])
and being sure, that nothing gets unnecessarily copied from place to place.
Using your code, we need to
replicate
the 'repl' column to make the two subset datasets equal and then assign the values as you didAnother option using
data.table
Or with
dplyr
data
Here's a possible solution using
ifelse()
:One way of doing this with base R: