This question already has an answer here:
- All Levels of a Factor in a Model Matrix in R 8 answers
I need to create a new data frame nDF that binarizes all categorical variables and at the same time retains all other variables in a data frame DF. For example, I have the following feature variables: RACE (4 types) and AGE, and an output variable called CLASS.
DF =
RACE AGE (BELOW 21) CLASS Case 1 HISPANIC 0 A Case 2 ASIAN 1 A Case 3 HISPANIC 1 D Case 4 CAUCASIAN 1 B
I want to convert this into nDF with five (5) variables or four (4) even:
RACE.1 RACE.2 RACE.3 AGE (BELOW 21) CLASS Case 1 0 0 0 0 A Case 2 0 0 1 1 A Case 3 0 0 0 1 D Case 4 0 1 0 1 B
I am familiar with the treatment contrast to the variable DF$RACE. However, if I implement
contrasts(DF$RACE) = contr.treatment(4)
what I get is still a DF of three variables, but with variable DF$RACE having the attribute "contrasts."
What I ultimately want though is a new data frame nDF as illustrated above, but which can be very tedious to evaluate if one has around 50 feature variables, with more than five (5) of them being categorical variables.