I'm collecting logs in R with 3 columns:
week, probe, and number of observations.
There aren't record when there is no observation.
week=c(1,2,2,4)
probe=c("A","C","B","C")
obs=c(2,4,3,1)
logs=data.frame(week,probe,obs)
logs
week probe obs
1 A 2
2 C 4
2 B 3
4 C 1
I want to reformat the data so that it includes all weeks and all probes even if there was no observation, so that it looks like this:
week probe obs
1 A 2
1 B 0
1 C 0
1 D 0
2 A 0
2 B 0
2 C 3
2 D 4
3 A 0
3 B 0
3 C 0
3 D 0
4 A 0
4 B 0
4 C 1
4 D 0
I have the list of all probes here:
allprobes=c("A","B","C","D")
and I want to look at these weeks:
allweeks=c(1:4)
I've been looking at melt, cast, reshape, but I only manage to get 1 line per id or month...
as I actually want to keep the original format of the logs.
It seems easy enough at first but I'm now stuck...
Any advice on how to get the data formatted this way?
Thanks a lot for any help.
Two options in base R:
Use expand.grid
and merge
:
> fullFrame <- expand.grid(allweeks, allprobes)
> names(fullFrame) <- c("week", "probe")
> merge(fullFrame, logs, all = TRUE)
week probe obs
1 1 A 2
2 1 B NA
3 1 C NA
4 1 D NA
5 2 A NA
6 2 B 3
7 2 C 4
8 2 D NA
9 3 A NA
10 3 B NA
11 3 C NA
12 3 D NA
13 4 A NA
14 4 B NA
15 4 C 1
16 4 D NA
expand.grid
will create a data.frame
of all the possible combinations of your "allprobes" and "allweeks" objects. Then, we rename the columns of that new data.frame
to match the relevant columns from your "logs" data.frame
when using merge
. The argument all = TRUE
tells merge
to fill in the missing values with NA
.
If you want zeroes instead of NA
, proceed as follows:
fullFrame <- expand.grid(allweeks, allprobes)
names(fullFrame) <- c("week", "probe")
finalLogs <- merge(fullFrame, logs, all = TRUE)
finalLogs[is.na(finalLogs)] <- 0
Use xtabs
after converting "week" and "probe" to factors
If you convert "week" and "probe" to factors that includes all the relevant levels, then you can simply use xtabs
wrapped in data.frame
:
logs$week <- factor(logs$week, levels=c(1, 2, 3, 4))
logs$probe <- factor(logs$probe, levels=c("A", "B", "C", "D"))
data.frame(xtabs(obs ~ week + probe, logs))
# week probe Freq
# 1 1 A 2
# 2 2 A 0
# 3 3 A 0
# 4 4 A 0
# 5 1 B 0
# 6 2 B 3
# 7 3 B 0
# 8 4 B 0
# 9 1 C 0
# 10 2 C 4
# 11 3 C 0
# 12 4 C 1
# 13 1 D 0
# 14 2 D 0
# 15 3 D 0
# 16 4 D 0
The complete
function in the tidyr
package is a nice utility for this operation:
# get all the levels in the factor
logs$probe = factor(logs$probe, levels = allprobes)
logs$week = factor(logs$week, levels = 1:4)
tidyr::complete(logs, week, probe, fill = list(obs = 0))
# # A tibble: 16 × 3
# week probe obs
# <fctr> <fctr> <dbl>
# 1 1 A 2
# 2 1 B 0
# 3 1 C 0
# 4 1 D 0
# 5 2 A 0
# 6 2 B 3
# 7 2 C 4
# 8 2 D 0
# 9 3 A 0
# 10 3 B 0
# 11 3 C 0
# 12 3 D 0
# 13 4 A 0
# 14 4 B 0
# 15 4 C 1
# 16 4 D 0