How I can find out 1st and last observation with i

2019-08-27 03:59发布

data is below

dialled     Ringing     state   duration
NA  NA  NA  0
NA  NA  NA  0
NA  NA  NA  0
NA  NA  NA  0
123 NA  NA  0
123 NA  NA  0
123 NA  NA  0
123 NA  NA  60
NA  NA  active  0
NA  NA  active  0
NA  NA  inactive    0
NA  NA  inactive    0
123 NA  inactive    0
123 NA  inactive    0
123 NA  inactive    0
NA  NA  inactive  0
NA  NA  inactive  0
NA  NA  inactive    0
222 NA  inactive    0
222 NA  inactive    0
222 NA  inactive    37
NA  NA  active  0
NA  NA  active  0
NA  NA  inactive    0
123 NA  inactive    0
123 NA  inactive    0
123 NA  active  60
NA  NA  active  0
NA  NA  active  0
NA  NA  active  0
NA  NA  active  0
123 NA  inactive    0
123 NA  inactive    0
123 NA  inactive    0


answer i am looking for is

dialled     Ringing     state   duration
123 NA  NA          0
123 NA  NA          60
123 NA  inactive    0
123 NA  inactive    0
222 NA  inactive    0
222 NA  inactive    37
123 NA  inactive    0
123 NA  inactive    60
123 NA  inactive    0
123 NA  inactive    0

also If you can help me to get the immediate next row, after last row of every by group and Rbind them

标签: r
2条回答
仙女界的扛把子
2楼-- · 2019-08-27 04:42

In data.table v1.9.5, there's a new function rleid() that helps accomplish this task fairly straightforward. You can install it by following these instructions:

require(data.table)
setDT(df)[, if (!is.na(dialled[1L])) .SD[c(1L, .N)], 
                by=.(dialled, rleid(dialled))]
#     dialled rleid Ringing    state duration
#  1:     123     2      NA       NA        0
#  2:     123     2      NA       NA       60
#  3:     123     4      NA inactive        0
#  4:     123     4      NA inactive        0
#  5:     222     6      NA inactive        0
#  6:     222     6      NA inactive       37
#  7:     123     8      NA inactive        0
#  8:     123     8      NA   active       60
#  9:     123    10      NA inactive        0
# 10:     123    10      NA inactive        0

.SD contains the subset of data for groups specified in by =.

查看更多
做个烂人
3楼-- · 2019-08-27 04:50

You could create a grouping variable "grp" (similar as here). Subset the rows of "df" that are not '0' for "grp", use slice to get the first and last row for each "grp", ungroup and remove the grp variable.

rl <- rle(!is.na(df$dialled))
grp <- inverse.rle(within.list(rl, 
      values[values] <- cumsum(values)[values]))
df$grp <- grp
library(dplyr)
df %>%
    filter(grp!=0) %>% 
    group_by(grp) %>% 
    slice(c(1, n()))%>%
    ungroup() %>%
    select(-grp)
#       dialled Ringing    state duration
#1      123      NA       NA        0
#2      123      NA       NA       60
#3      123      NA inactive        0
#4      123      NA inactive        0
#5      222      NA inactive        0
#6      222      NA inactive       37
#7      123      NA inactive        0
#8      123      NA   active       60
#9      123      NA inactive        0
#10     123      NA inactive        0

Or a base R option would be to get the row index of first and last rows of subset dataset "df1" based on "grp" and then use it to extract the rows.

df1 <- df[grp!=0,]
df2 <- df1[unlist(tapply(1:nrow(df1), grp[grp!=0],
           FUN=function(x) c(head(x,1), tail(x,1)))),]

Update

It is not clear from the comments. Perhaps this helps

 df2 %>%
    group_by(grp) %>% 
    filter(any(duration>0)) %>% 
    slice(1)
 #    dialled Ringing    state duration grp
 #1     123      NA       NA        0   1
 #2     222      NA inactive        0   3
 #3     123      NA inactive        0   4

data

df <- structure(list(dialled = c(NA, NA, NA, NA, 123L, 123L, 123L, 
123L, NA, NA, NA, NA, 123L, 123L, 123L, NA, NA, NA, 222L, 222L, 
222L, NA, NA, NA, 123L, 123L, 123L, NA, NA, NA, NA, 123L, 123L, 
123L), Ringing = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
 NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
 NA, NA, NA, NA, NA, NA, NA), state = c(NA, NA, NA, NA, NA, NA, 
 NA, NA, "active", "active", "inactive", "inactive", "inactive", 
 "inactive", "inactive", "inactive", "inactive", "inactive", "inactive", 
 "inactive", "inactive", "active", "active", "inactive", "inactive", 
 "inactive", "active", "active", "active", "active", "active", 
 "inactive", "inactive", "inactive"), duration = c(0L, 0L, 0L, 
 0L, 0L, 0L, 0L, 60L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
 0L, 0L, 37L, 0L, 0L, 0L, 0L, 0L, 60L, 0L, 0L, 0L, 0L, 0L, 0L, 
 0L)), .Names = c("dialled", "Ringing", "state", "duration"),
 class = "data.frame", row.names = c(NA, -34L))
查看更多
登录 后发表回答