I have the following data frame which is created below:
temp <- as.data.frame(with(uadm, table(prlo_state_code)))
I am looking to create 11 dummy variables. One for each of the top 10 and an 'other'. The top 10 can easily be found with:
#top10
temp <- temp[order(temp$Freq, decreasing=T),]
head(temp, n=10)
I know R is great, so I am assuming there is an easy to auto-create (and name) the dummy variables from the top 10 and collapse the rest into a final dummy called 'other.'
Thanks in advance for any help or insight.
You rarely need dummy variables -- R silently creates them for you.
If you just want to put all the classes that are not in the top 10 together,
you can simply use ifelse
and %in%
.
x <- sample( LETTERS, 1e4, replace=TRUE, p=runif(26) )
top10 <- names( sort(table(x), decreasing=TRUE)[1:10] )
y <- ifelse( x %in% top10, as.character(x), "Rest" )
table(y)
If you absolutely need dummy variables, you can create them with model.matrix
.
model.matrix(~y)
R's regression functions will make up the necessary columns in the model.matrix when a factor-classed variable is entered in a formula.. It's all automatic. The default contrast is between the first factor level and each of the other levels, so-called "treatment constrasts". Other choices are possible.