So I have a data frame that I'd like to analyze. The problem is that instead of Yes/No, there are a bunch of 1s and 0s (1 being Yes, 0 being No) in the data frame. How do I modify the data frame to make it so instead of the 1s and 0s there are Yes and No so I can use logistic regression? I am sure there is a simple fix for this that I am not thinking of
Thanks!
Use ?factor
.
See this example
> set.seed(1)
> dummyVariable <- sample(c(0,1), 10, TRUE) # bunch of 0 and 1
> newVariable <- factor(dummyVariable, levels=c(0,1), labels=c("No", "Yes"))
> newVariable # this is now a dummy variable ready for regression analysis
[1] No No Yes Yes No Yes Yes Yes Yes No
Levels: No Yes
You can also just use your values as indices of the c('no','yes')
vector, adding 1 as your values start at 0.
This will be easy to generalize in case of more than two values, which wouldn't work so well with ifelse
:
c('no','yes')[df$col+1]
or
factor(c('no','yes')[df$col+1],c('no','yes'))
Another way to get a factor out of this:
factor(ifelse(dummyVariable, 'Yes', 'No'))
Try using gsub
.
dummyVariable<-gsub(0,"No",dummyVariable)
dummyVariable<-gsub(1,"Yes",dummyVariable)
dummyVariable
# [1] "No" "No" "Yes" "Yes" "No" "Yes" "Yes" "Yes" "Yes" "No"