Grouping rows on the basis of row differences in R

This question already has an answer here:

Group rows based on gaps in time series [duplicate] 2 answers

I have a set of animal locations with different sampling intervals. What I want to do is group and seqences where the sampling interval matches a certain criteria (e.g. is below a certain value). Let me illustrate with some dummy data:

start <- Sys.time()
timediff <- c(rep(5,3),20,rep(5,2))
timediff <- cumsum(timediff)

# Set up a dataframe with a couple of time values
df <- data.frame(TimeDate = start + timediff)

# Calculate the time differences between the rows
df$TimeDiff <- c(as.integer(tail(df$TimeDate,-1) - head(df$TimeDate,-1)),NA)

# Define a criteria in order to form groups
df$TimeDiffSmall <- df$TimeDiff <= 5

             TimeDate TimeDiff TimeDiffSmall
1 2016-03-15 23:11:49        5          TRUE
2 2016-03-15 23:11:54        5          TRUE
3 2016-03-15 23:11:59       20         FALSE
4 2016-03-15 23:12:19        5          TRUE
5 2016-03-15 23:12:24        5          TRUE
6 2016-03-15 23:12:29       NA            NA

In this dummy data, rows 1:3 belong to one group, since the time difference between them is <= 5 seconds. 4 - 6 belong to the second group, but hypothetically there could be a number of rows in between the two groups that dont belong to any group (TimeDiffSmall equals to FALSE).

Combining the information from two multiple SO answers (e.g. part 1), I've create a function that solves this problem.

number.groups <- function(input){
  # part 1: numbering successive TRUE values
  input[is.na(input)] <- F
  x.gr <- ifelse(x <- input == TRUE, cumsum(c(head(x, 1), tail(x, -1) - head(x, -1) == 1)),NA)
  # part 2: including last value into group
  items <- which(!is.na(x.gr))
  items.plus <- c(1,items+1)
  sel <- !(items.plus %in% items)
  sel.idx <- items.plus[sel]
  x.gr[sel.idx] <- x.gr[sel.idx-1]
  return(x.gr)


 # Apply the function to create groups
 df$Group <- number.groups(df$TimeDiffSmall)

             TimeDate TimeDiff TimeDiffSmall Group
1 2016-03-15 23:11:49        5          TRUE     1
2 2016-03-15 23:11:54        5          TRUE     1
3 2016-03-15 23:11:59       20         FALSE     1
4 2016-03-15 23:12:19        5          TRUE     2
5 2016-03-15 23:12:24        5          TRUE     2
6 2016-03-15 23:12:29       NA            NA     2

This function actually works to solve my problem. This this is, it seems like a crazy and rookie way to go about this. Is there a function that could solve my problem more professionally?

标签： r grouping

1条回答

老娘就宠你

2楼-- · 2019-06-20 01:54

Like @thelatemail, I'd use the following to get the group IDs. It works because cumsum() will end up incrementing the group count each time it reaches an element preceded by a greater-than-5-second time interval.

df$Group <- cumsum(c(TRUE, diff(df$TimeDate) > 5))
df$Group
# [1] 1 1 1 2 2 2

0人赞添加讨论(0) 举报

Grouping rows on the basis of row differences in R

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间