Automatic Dummy Variables in R

2019-02-20 04:14发布

问题:

I have the following data frame which is created below:

temp <- as.data.frame(with(uadm, table(prlo_state_code)))

I am looking to create 11 dummy variables. One for each of the top 10 and an 'other'. The top 10 can easily be found with:

#top10
temp <- temp[order(temp$Freq, decreasing=T),]
head(temp, n=10)

I know R is great, so I am assuming there is an easy to auto-create (and name) the dummy variables from the top 10 and collapse the rest into a final dummy called 'other.'

Thanks in advance for any help or insight.

回答1:

You rarely need dummy variables -- R silently creates them for you.

If you just want to put all the classes that are not in the top 10 together, you can simply use ifelse and %in%.

x <- sample( LETTERS, 1e4, replace=TRUE, p=runif(26) )
top10 <- names( sort(table(x), decreasing=TRUE)[1:10] )
y <- ifelse( x %in% top10, as.character(x), "Rest" )
table(y)

If you absolutely need dummy variables, you can create them with model.matrix.

model.matrix(~y) 


回答2:

R's regression functions will make up the necessary columns in the model.matrix when a factor-classed variable is entered in a formula.. It's all automatic. The default contrast is between the first factor level and each of the other levels, so-called "treatment constrasts". Other choices are possible.