-->

Create a matrix of dummy variables from my data fr

2019-07-18 12:47发布

问题:

I have a data based on different years, repeated several time. I want my output having columns equal to number of years, each column for one year. Now, the purpose is to create dummy for each year separately. For example, the output column for year 2000 must have a value "1" whenever there is a non-NA observation in the main data parallel to year 2000, else "0". Moreover, NA must remain NA. Please see below a small sample of input data:

df:
2000    NA
2001    NA
2002   -1.3
2000    1.1
2001    0
2002    NA
2000   -3
2001    3
2002    4.1

Now the output should be:

df1:
2000    2001    2002
 NA      NA      NA
 NA      NA      NA
 0       0       1
 1       0       0
 0       1       0
 NA      NA      NA
 1       0       0
 0       1       0
 0       0       1

I would prefer to obtain this output by using a "for loop", if possible. Otherwise, any simpler approach will be appreciated.

回答1:

No loop is needed. We can use model.matrix:

## your data variable and NA index
x <- c(NA, NA, -1.3, 1.1, 0, NA, -3, 3, 4.1)
na_id <- is.na(x)

## code your year variable as a factor
year <- factor(rep(2000:2002, 3))

## original model matrix; drop intercept to disable contrast
X <- model.matrix(~ year - 1)

#  year2000 year2001 year2002
#1        1        0        0
#2        0        1        0
#3        0        0        1
#4        1        0        0
#5        0        1        0
#6        0        0        1
#7        1        0        0
#8        0        1        0
#9        0        0        1

## put NA where `x` is NA (we have used recycling rule here)
X[na_id] <- NA

#  year2000 year2001 year2002
#1       NA       NA       NA
#2       NA       NA       NA
#3        0        0        1
#4        1        0        0
#5        0        1        0
#6       NA       NA       NA
#7        1        0        0
#8        0        1        0
#9        0        0        1

Matrix X will have some attributes. You can drop them if you want:

attr(X, "assign") <- attr(X, "contrasts") <- NULL

You can also rename the column names of this matrix to something else, like

colnames(X) <- 2000:2002