tidyr::complete()
adds rows to a data.frame
for combinations of column values that are missing from the data. Example:
library(dplyr)
library(tidyr)
df <- data.frame(person = c(1,2,2),
observation_id = c(1,1,2),
value = c(1,1,1))
df %>%
tidyr::complete(person,
observation_id,
fill = list(value=0))
yields
# A tibble: 4 × 3
person observation_id value
<dbl> <dbl> <dbl>
1 1 1 1
2 1 2 0
3 2 1 1
4 2 2 1
where the value
of the combination person == 1
and observation_id == 2
that is missing in df
has been filled in with a value of 0.
What would be the equivalent of this in data.table
?
I reckon that the philosophy of data.table entails fewer specially-named functions for tasks than you'll find in the tidyverse, so some extra coding is required, like:
After this, you still have to manually handle the filling of values for missing levels. Following @thelatemail's comment:
See @Jealie's answer regarding a feature that will sidestep this.
Certainly, it's crazy that the column names have to be entered three times here. But on the other hand, one can write a wrapper:
As a quick way of avoiding typing the names three times for the first step, here's @thelatemail's idea:
Update: now you don't need to enter names twice in CJ thanks to @MichaelChirico & @MattDowle for the improvement.
There might be a better answer out there, but this works:
Which gives:
Now, if you would like to be able to fill with any value (and not
NA
), I would suggest to wait for the corresponding feature to be finished or contribute to it :)