R: include factors with no entries when using dcas

2019-07-01 19:57发布

I am using the reshape2 function dcast on a dataframe. One of the variables is a factor where some of the levels do not appear in the dataframe, but I would to include all values in the new columns created.

For example say I run the following

library(reshape2)
dataDF <- data.frame(
  id = 1:6,
  id2 = c(1,2,3,1,2,3),
  x = c(rep('t1', 3), rep('t2', 3)),
  y = factor(c('A', 'B', 'A', 'B', 'B', 'C'), levels = c('A', 'B', 'C', 'D')),
  value = rep(1)
)

dcast(dataDF, id + id2 ~ x + y, fill = 0)

I get the following

  id id2 t1_A t1_B t2_B t2_C
1  1   1    1    0    0    0
2  2   2    0    1    0    0
3  3   3    1    0    0    0
4  4   1    0    0    1    0
5  5   2    0    0    1    0
6  6   3    0    0    0    1

But I also want to include the columns t1_C, t1_D, t2_A and t2_D full of 0's

i.e. I want the following

  id id2 t1_A t1_B t1_C t1_D t2_A t2_B t2_C t2_D
1  1   1    1    0    0    0    0    0    0    0
2  2   2    0    1    0    0    0    0    0    0
3  3   3    1    0    0    0    0    0    0    0
4  4   1    0    0    0    0    0    1    0    0
5  5   2    0    0    0    0    0    1    0    0
6  6   3    0    0    0    0    0    0    1    0

Also, as an aisde, would it be possible to create the above without having the column 'value' full of ones in the initial dataframe. Basically just want to cast x & y in their own columns with a 1 if they exist in that id.

Thanks in advance

EDIT: Initially had one variable on LHS which Jeremy answer below, but actual have more than one variable on LHS so edited question to reflect this

标签: r reshape2
1条回答
ゆ 、 Hurt°
2楼-- · 2019-07-01 20:36

Try adding drop = FALSE to your dcast call, so that unused factor levels are not dropped:

dcast(dataDF, id ~ x + y, fill = 0, drop = FALSE)

  id t1_A t1_B t1_C t1_D t2_A t2_B t2_C t2_D
1  1    1    0    0    0    0    0    0    0
2  2    0    1    0    0    0    0    0    0
3  3    1    0    0    0    0    0    0    0
4  4    0    0    0    0    0    1    0    0
5  5    0    0    0    0    0    1    0    0
6  6    0    0    0    0    0    0    1    0

For your aside, yes, we just need to tell dcast what you want using a function to aggregate, in this case you want length:

data2 <- dataDF[,1:3]
dcast(data2, id ~ x + y, fill = 0, drop = FALSE, fun.aggregate = length)

For your edit, I'd use tidyr and dplyr rather than reshape2:

library(tidyr)
library(dplyr)

dataDF %>% left_join(expand.grid(x = levels(dataDF$x), y = levels(dataDF$y)), .) %>%
           unite(z, x, y) %>%
           spread(z, value, fill = 0) %>%
           na.omit

First we complete all combination of x and y using expand.grid and merging, then we unite them into one column, z, then we spread them out, then remove the NAs from the id columns:

  id id2 t1_A t1_B t1_C t1_D t2_A t2_B t2_C t2_D
1  1   1    1    0    0    0    0    0    0    0
2  2   2    0    1    0    0    0    0    0    0
3  3   3    1    0    0    0    0    0    0    0
4  4   1    0    0    0    0    0    1    0    0
5  5   2    0    0    0    0    0    1    0    0
6  6   3    0    0    0    0    0    0    1    0
查看更多
登录 后发表回答