Assigning values in a sequence to a group of conse

2019-05-14 00:59发布

I'm trying to group several consecutives rows (and assigning them the same value) while leaving some of the rows empty (when a certain condition is not fulfilled).

My data are locations (xy coordinates), the date/time at which they were measured, and the time span between measures. Somehow simplified, they look like this:

ID   X     Y      Time    Span
1    3445  7671   0:00    -
2    3312  7677   4:00    4
3    3309  7680   12:00   8
4    3299  7681   16:00   4
5    3243  7655   20:00   4
6    3222  7612   4:00    8
7    3260  7633   0:00    4
8    3254  7641   8:00    8
9    3230  7612   0:00    16
10   3203  7656   4:00    4
11   3202  7678   8:00    4
12   3159  7609   20:00   12
...

I'd like to assign a value to every sequence of locations that are measured within a time span of 4 hours, and make my data look like this:

ID   X     Y      Time    Span  Sequence
1    3445  7671   0:00    -     -
2    3312  7677   4:00    4     1
3    3309  7680   12:00   8     NA
4    3299  7681   16:00   4     2
5    3243  7655   20:00   4     2
6    3222  7612   4:00    8     NA
7    3260  7633   0:00    4     3
8    3254  7641   8:00    8     NA
9    3230  7612   0:00    16    NA
10   3203  7656   4:00    4     4
11   3202  7678   8:00    4     4
12   3159  7609   20:00   12    NA

I've tried several algorithms with a loop "for" plus "ifelse" condition like:

Sequence <- for (i in 1:max(ID)) {
ifelse (Span <= 4, i+1, "NA")
}

without any luck. I know my attempt is incorrect, but my programming skills are really basic and I haven't found any similar problem in the web.

Any ideas would be very appreciated!

3条回答
该账号已被封号
2楼-- · 2019-05-14 01:22
count = 0
for (i in 1:max(ID)) {
      Sequence[i] = ifelse(Span[i] <= 4, count <- count+1, NA)
}
查看更多
做个烂人
3楼-- · 2019-05-14 01:33

Here is a longish one liner:

ifelse(x <- DF$Span == 4, cumsum(c(head(x, 1), tail(x, -1) - head(x, -1) == 1)), NA)
# [1] NA  1 NA  2  2 NA  3 NA NA  4  4 NA

Explanation:

  • x is a vector of TRUE/FALSE showing where Span is 4.
  • tail(x, -1) is a safe way of writing x[2:length(x)]
  • head(x, -1) is a safe way of writing x[1:(length(x)-1)]
  • tail(x, -1) - head(x, -1) == 1 is a vector of TRUE/FALSE showing where we went from Span != 4 to Span == 4.
  • since the vector above is one element shorter than x, I prepended head(x, 1) in front of it. head(x, 1) is a safe way of writing x[1].
  • Then I take the cumsum so it converts the vector TRUE/FALSE into a vector of increasing integers: where Span jumps from !=4 to ==4 it increases by 1, otherwise stays constant.
  • Everything is wrapped into an ifelse so you only see numbers where x is TRUE, i.e., where Span == 4.
查看更多
趁早两清
4楼-- · 2019-05-14 01:33

Here's another alternative using rle and rep. We'll assume that your data.frame is named "test".

First, initialize your "Sequence" column, filling it with NA.

test$Sequence <- NA

Second, specify the condition that you are matching, in this case, test$Span == 4.

x <- test$Span == 4

Third, use the combination of rle's output (lengths and values) to get how many times each new run in the sequence occurs.

spanSeq <- rle(x)$lengths[rle(x)$values == TRUE]

Finally, use rep with the times argument set to the result obtained in step 3. Subset the required values of test$Sequence according to the index matched by test$Span == 4, and replace them with your new sequence.

test$Sequence[x] <- rep(seq_along(spanSeq), times = spanSeq)
test
#    ID    X    Y  Time Span Sequence
# 1   1 3445 7671  0:00    -       NA
# 2   2 3312 7677  4:00    4        1
# 3   3 3309 7680 12:00    8       NA
# 4   4 3299 7681 16:00    4        2
# 5   5 3243 7655 20:00    4        2
# 6   6 3222 7612  4:00    8       NA
# 7   7 3260 7633  0:00    4        3
# 8   8 3254 7641  8:00    8       NA
# 9   9 3230 7612  0:00   16       NA
# 10 10 3203 7656  4:00    4        4
# 11 11 3202 7678  8:00    4        4
# 12 12 3159 7609 20:00   12       NA

Once you understand the steps involved, you can also do this directly with within(). The following would give you the same result:

within(test, {
  Sequence <- NA
  spanSeq <- rle(Span == 4)$lengths[rle(Span == 4)$values == TRUE]
  Sequence[Span == 4] <- rep(seq_along(spanSeq), times = spanSeq)
  rm(spanSeq)
})
查看更多
登录 后发表回答