How to format a pivot like table that includes rec

2019-01-28 07:43发布

问题:

I'm collecting logs in R with 3 columns:
week, probe, and number of observations.
There aren't record when there is no observation.

week=c(1,2,2,4)  
probe=c("A","C","B","C")  
obs=c(2,4,3,1)
logs=data.frame(week,probe,obs)

logs

week probe obs  
1     A   2
2     C   4
2     B   3
4     C   1

I want to reformat the data so that it includes all weeks and all probes even if there was no observation, so that it looks like this:

week probe obs  
1     A   2  
1     B   0  
1     C   0  
1     D   0  
2     A   0  
2     B   0  
2     C   3  
2     D   4  
3     A   0  
3     B   0  
3     C   0  
3     D   0  
4     A   0  
4     B   0  
4     C   1  
4     D   0  

I have the list of all probes here:

allprobes=c("A","B","C","D")

and I want to look at these weeks:

allweeks=c(1:4)

I've been looking at melt, cast, reshape, but I only manage to get 1 line per id or month... as I actually want to keep the original format of the logs. It seems easy enough at first but I'm now stuck... Any advice on how to get the data formatted this way?

Thanks a lot for any help.

回答1:

Two options in base R:

Use expand.grid and merge:

> fullFrame <- expand.grid(allweeks, allprobes)
> names(fullFrame) <- c("week", "probe")
> merge(fullFrame, logs, all = TRUE)
   week probe obs
1     1     A   2
2     1     B  NA
3     1     C  NA
4     1     D  NA
5     2     A  NA
6     2     B   3
7     2     C   4
8     2     D  NA
9     3     A  NA
10    3     B  NA
11    3     C  NA
12    3     D  NA
13    4     A  NA
14    4     B  NA
15    4     C   1
16    4     D  NA

expand.grid will create a data.frame of all the possible combinations of your "allprobes" and "allweeks" objects. Then, we rename the columns of that new data.frame to match the relevant columns from your "logs" data.frame when using merge. The argument all = TRUE tells merge to fill in the missing values with NA.

If you want zeroes instead of NA, proceed as follows:

fullFrame <- expand.grid(allweeks, allprobes)
names(fullFrame) <- c("week", "probe")
finalLogs <- merge(fullFrame, logs, all = TRUE)
finalLogs[is.na(finalLogs)] <- 0

Use xtabs after converting "week" and "probe" to factors

If you convert "week" and "probe" to factors that includes all the relevant levels, then you can simply use xtabs wrapped in data.frame:

logs$week <- factor(logs$week, levels=c(1, 2, 3, 4))
logs$probe <- factor(logs$probe, levels=c("A", "B", "C", "D"))
data.frame(xtabs(obs ~ week + probe, logs))
#    week probe Freq
# 1     1     A    2
# 2     2     A    0
# 3     3     A    0
# 4     4     A    0
# 5     1     B    0
# 6     2     B    3
# 7     3     B    0
# 8     4     B    0
# 9     1     C    0
# 10    2     C    4
# 11    3     C    0
# 12    4     C    1
# 13    1     D    0
# 14    2     D    0
# 15    3     D    0
# 16    4     D    0


回答2:

The complete function in the tidyr package is a nice utility for this operation:

# get all the levels in the factor
logs$probe = factor(logs$probe, levels = allprobes)
logs$week = factor(logs$week, levels = 1:4)

tidyr::complete(logs, week, probe, fill = list(obs = 0))
# # A tibble: 16 × 3
#      week  probe   obs
#    <fctr> <fctr> <dbl>
# 1       1      A     2
# 2       1      B     0
# 3       1      C     0
# 4       1      D     0
# 5       2      A     0
# 6       2      B     3
# 7       2      C     4
# 8       2      D     0
# 9       3      A     0
# 10      3      B     0
# 11      3      C     0
# 12      3      D     0
# 13      4      A     0
# 14      4      B     0
# 15      4      C     1
# 16      4      D     0


标签: r reshape