i hope this one isn´t stupid.
I have two dataframes with Variables ID and gender/sex. In df1, there are NAs. In df2, the variable is complete. I want to complete the column in df1 with the values from df2. (In df1 the variable is called "gender". In df2 it is called "sex".)
Here is what i tried so far:
#example-data
ID<-seq(1,30,by=1)
df1<-as.data.frame(ID)
df2<-df1
df1$gender<-c(NA,"2","1",NA,"2","2","2","2","2","2",NA,"2","1","1",NA,"2","2","2","2","2","1","2","2",NA,"2","2","2","2","2",NA)
df2$sex<-c("2","2","1","2","2","2","2","2","2","2","2","2","1","1","2","2","2","2","2","2","1","2","2","2","2","2","2","2","2","2")
#Approach 1:
NAs.a <- is.na(df1$gender)
df1$gender[NAs.a] <- df2[match(df1$ID[NAs.a], df2$ID),]$sex
#Approach 2 (i like dplyr a lot, perhaps there´s a way to use it):
library("dplyr")
temp<-df2 %>% select(ID,gender)
#EDIT:
#df<-left_join(df1$gender,df2$gender, by="ID")
df<-left_join(df1,df2, by="ID")
Thank you very much.
You could do
This assumes - as in the example - that all ID's from df1 are also present in df2 and have a sex/gender information there.
If you have other columns in your data you could also try this instead:
Here's a quick solution using
data.table
s binary join this will join onlygender
withsex
and leave all the rest of the columns untouchedThis would probably be the simplest with base R.