I am looking for a way to, within id
groups, count unique occurrences of value shifts in TF
in the data datatbl
.
I want to count both forward and backwards from when TF
changes between 1
and 0
or o
and 1
. The counting is to be stored in a new variable PM##
, so that the PM##
s holds each unique shift in TF
, in both plus and minus. The MWE below leads to an outcome with 7 PM, but my production data can have 15 or more shifts. If a TF
values does not change between NA
's I want to mark it 0
.
This question is similar to a question I previously asked, but the last part about TF
standing alone is new. Both Uwe and Psidom provided elegant answers to the initial question using data.table
here and using tidyverse
here. after conferencing with Uwe, I am posting this slightly modified version of my question.
If this question violates any SO policies please let me know and I'll be happy to reopen my initial question or append this an bounty-issue.
To illustrate my question with a minimal working example. I have data like this,
what I have,
# install.packages(c("tidyverse"), dependencies = TRUE)
library(tibble)
tbl <- tibble(id = c(rep(10L, 17L), rep(0L, 13L), rep(1L, 10L)),
TF = c(NA, NA, 0, NA, 0, NA, 1, 1, 1, 1, 1, NA, 1, 0, 1, 0, 1, NA, 0L, NA, 0L,
0L, 1L, 1L, 1L, 0L, 0L, NA, NA, 0L, NA, 0L, 0L, 0L, 1L, 1L, 1L, 0L, NA, 1L))
tbl %>% print(n=18)
#> # A tibble: 40 x 2
#> id TF
#> <int> <dbl>
#> 1 10 NA
#> 2 10 NA
#> 3 10 0
#> 4 10 NA
#> 5 10 0
#> 6 10 NA
#> 7 10 1
#> 8 10 1
#> 9 10 1
#> 10 10 1
#> 11 10 1
#> 12 10 NA
#> 13 10 1
#> 14 10 0
#> 15 10 1
#> 16 10 0
#> 17 10 1
#> 18 0 NA
#> # ... with 22 more rows
what I am trying to obtain,
tblPM <- structure(list(id = c(10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), TF = c(NA, NA, 0, NA, 0, NA, 1, 1, 1, 1, 1,
NA, 1, 0, 1, 0, 1, NA, 0, NA, 0, 0, 1, 1, 1, 0, 0,
NA, NA, 0, NA, 0, 0, 0, 1, 1, 1, 0, NA, 1), PM01 = c(NA,
NA, 0L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 0L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, -3L,
-2L, -1L, 1L, 2L, 3L, NA, NA, NA), PM02 = c(NA, NA, NA, NA, 0L,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, -2L,
-1L, 1L, 2L, 3L, NA, NA, NA, NA, NA, NA, NA, NA, NA, -3L, -2L,
-1L, 1L, NA, NA), PM03 = c(NA, NA, NA, NA, NA, NA, 0L, 0L, 0L,
0L, 0L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, -3L, -2L,
-1L, 1L, 2L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
0L), PM04 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
-1L, 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 0L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), PM05 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, -1L, 1L, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA), PM06 = c(NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, -1L, 1L, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA), PM07 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, -1L, 1L, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
)), .Names = c("id", "TF", "PM01", "PM02", "PM03", "PM04", "PM05",
"PM06", "PM07"), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -40L
))
tblPM %>% print(n=18)
#> # A tibble: 40 x 9
#> id TF PM01 PM02 PM03 PM04 PM05 PM06 PM07
#> <int> <dbl> <int> <int> <int> <int> <int> <int> <int>
#> 1 10 NA NA NA NA NA NA NA NA
#> 2 10 NA NA NA NA NA NA NA NA
#> 3 10 0 0 NA NA NA NA NA NA
#> 4 10 NA NA NA NA NA NA NA NA
#> 5 10 0 NA 0 NA NA NA NA NA
#> 6 10 NA NA NA NA NA NA NA NA
#> 7 10 1 NA NA 0 NA NA NA NA
#> 8 10 1 NA NA 0 NA NA NA NA
#> 9 10 1 NA NA 0 NA NA NA NA
#> 10 10 1 NA NA 0 NA NA NA NA
#> 11 10 1 NA NA 0 NA NA NA NA
#> 12 10 NA NA NA NA NA NA NA NA
#> 13 10 1 NA NA NA -1 NA NA NA
#> 14 10 0 NA NA NA 1 -1 NA NA
#> 15 10 1 NA NA NA NA 1 -1 NA
#> 16 10 0 NA NA NA NA NA 1 -1
#> 17 10 1 NA NA NA NA NA NA 1
#> 18 0 NA NA NA NA NA NA NA NA
#> # ... with 22 more rows
identical([some solution], tblPM)
#> [1] TRUE
update w/ microbenchmark
2018-01-24 14:20:18Z,
Thanks to Fierr and Chris for taking the time to tease out the logic and submit an answer. Inspired my this setup I've computed a small microbenchmark comparison of thier functions. I put Fierrs answer into the function
tidyverse_Fierr()and Chris' answer into
dt_Chris()` (if someone want the exact functions please let me know and I'll add them here.
After some minor tweaks they both come out identical when match with tblPM
, i.e.
identical(tblPM, tidyverse_Fierr(tbl))
#> [1] TRUE
identical(tblPM, dt_Chris(tbl))
#> [1] TRUE
Now to the quick microbenchmark,
df_test <- bind_rows(rep(list(tbl), 111))
microbenchmark::microbenchmark(tidyverse_Fierr(df_test), dt_Chris(df_test), times = 3*1)
#> Unit: milliseconds
#> expr min mean median uq max neval cld
#> tidyverse_Fierr(df_test) 19503.366 20171.268 20080.99 20505.219 20929.4489 3 b
#> dt_Chris(df_test) 199.165 233.924 203.72 251.304 298.8887 3 a
Interestingly the tidy_method comes out way faster in this kinda similar comparison.