R: Expanding an R factor into dummy columns for ev

2020-03-30 02:29发布

I have a quite big data frame in R with two columns. I am trying to make out of the Code column (factor type with 858 levels) the dummy variables. The problem is that the R Studio always crashed when I am trying to do that.

> str(d)
'data.frame':   649226 obs. of  2 variables:
 $ User: int  210 210 210 210 269 317 317 317 317 326 ...
 $ Code      : Factor w/ 858 levels "AA02","AA03",..: 164 494 538 626 464 496 435 464 475 163 ... 

The User column is not unique, meaning that there can be several rows with the same User. Doesn't matter if in the end the amount of rows remains the same or the rows with the same User are merged into one row having several columns non-empty with the count of Codes.

I found couple of solutions that work for a smaller dataset, but not for mine.

Would be great if you can recommend me some method which is fast and working for such type of data.

Thanks!

1条回答
女痞
2楼-- · 2020-03-30 03:08

This worked for me perfectly:

library(reshape2)
m <- acast(data = d, User ~ Code)

The only thing was that it produced NAs, instead of 0s, but this can be easily changed with this:

m[is.na(m)] <- 0
查看更多
登录 后发表回答