Add a counter meeting specific conditions

2019-08-13 22:11发布

问题:

Problem Statement

Given the below data set which has two columns Column1 & Column 2, add another two more column called Counter and Counting time. The conditions to Initialize the counter and Counter time is as follows:

  1. The counter should be incremented only when value in Column1 > 1 and Column2 = 0
  2. The Counter must start to increment after 2 values from the condition satisfied row
  3. The Counting time must contain the values of number of time the sequence has occurred (Sequence of data points that has satisfied the condition)

Data Frame with Expected output

Column1 Column2 Counter Counter_Time  
1.1254  2.784    0        0
4.678   7.985    0        0  
8.89      0      0        1
7.65      0      0        1  
3.54      0      1        1  
4.32      0      2        1  
9.83      0      3        1
3.86     4.3     0        1
5.63     9.8     0        1
4.53      0      0        2
6.83      0      0        2   
3.431     0      4        2
8.976     0      5        2
9.864     0      6        2
7.3      9.2     0        2
2.3      3.2     0        2
4.3       0      0        3
2.1       0      0        3
4.32      0      7        3  

I came across similar kind of question got the answer on how to increment the counter but I wasn't able to satisfy the above mentioned conditions. Please note that the Counter should start after the two rows which satisfies the condition.

Observation from the Data Set

  1. The row number 3 satisfies the condition, The counter is not initialized but Counter_Time has been incremented
  2. The Counter has started from the row number 5(According to condition first 2 row from the condition satisfied values should not trigger the counter)
  3. Counter in Row number 8 comes back to 0 and Counter_Time remains same
  4. Again, the Counter has started to increment from row number 12 by not considering the row 10 and 11. But Counter_time was incremented at row 10

I have elaborated the problem statement so that it is clear to the experts to provide accurate solutions.

回答1:

# Load packages
library(tidyverse)
library(data.table)

# Create example data frame
dt <- fread("Column1 Column2
1.1254  2.784
4.678   7.985 
8.89      0
7.65      0  
3.54      0
4.32      0  
9.83      0
3.86     4.3
5.63     9.8
4.53      0
6.83      0  
3.431     0
8.976     0
9.864     0
7.3      9.2
2.3      3.2
4.3       0
2.1       0
4.32      0  ")

### Create Counter_Time
dt2 <- dt %>%
  mutate(Merge_ID = 1:n()) %>%
  mutate(Condition = ifelse(Column1 > 1 & Column2 == 0, 1, 0)) %>%
  mutate(ID = rleid(Condition)) %>%
  mutate(Counter_Time = ifelse(Condition == 0, (ID - 1)/2, ID/2))

### Create Counter
dt3 <- dt2 %>%
  group_by(Counter_Time) %>%
  slice(3:n()) %>%
  filter(Condition == 1) %>%
  ungroup() %>%
  mutate(Counter = 1:n()) %>%
  select(Merge_ID, Counter)

### Merge dt2 and dt3 together, dt4 is the final output
dt4 <- dt2 %>%
  left_join(dt3, by = "Merge_ID") %>%
  mutate(Counter = ifelse(is.na(Counter), 0, Counter)) %>%
  select(Column1, Column2, Counter, Counter_Time)

Update

The following code is an update after dt2 is created. The idea is to make sure when no rows meet the condition, the code still generates an output with Counter all equals to 0.

### Set the index
begin_index <- 3

### Filter the right condition
dt3 <- dt2 %>%
  group_by(Counter_Time) %>%
  slice(begin_index:n()) %>%
  filter(Condition == 1) %>%
  ungroup() 


### Check if dt3 has any rows
if (nrow(dt3) > 0){

  dt3 <- dt3 %>%
    mutate(Counter = 1:n()) %>%
    select(Merge_ID, Counter)

  ### Merge dt2 and dt3 together, dt4 is the final output
  dt4 <- dt2 %>%
    left_join(dt3, by = "Merge_ID") %>%
    mutate(Counter = ifelse(is.na(Counter), 0, Counter)) %>%
    select(Column1, Column2, Counter, Counter_Time)

### If nrow(dt3) is 0, no rows meet the condition
} else {

  ### Create Counter column from dt2
  dt4 <- dt2 %>%
    mutate(Counter = 0) %>%
    select(Column1, Column2, Counter, Counter_Time)

}


回答2:

A compact solution with data.table (using the same data as @ycw):

library(data.table)
dt[, counter := 0
   ][, counter_time := cumsum(c(0,diff(Column1 > 1 & Column2 == 0))==1)
     ][Column1 > 1 & Column2 == 0, counter := c(0,0,rep(1,(.N-2))), by = counter_time
       ][counter == 1, counter := cumsum(counter)]

which gives:

> dt
    Column1 Column2 counter counter_time
 1:  1.1254   2.784       0            0
 2:  4.6780   7.985       0            0
 3:  8.8900   0.000       0            1
 4:  7.6500   0.000       0            1
 5:  3.5400   0.000       1            1
 6:  4.3200   0.000       2            1
 7:  9.8300   0.000       3            1
 8:  3.8600   4.300       0            1
 9:  5.6300   9.800       0            1
10:  4.5300   0.000       0            2
11:  6.8300   0.000       0            2
12:  3.4310   0.000       4            2
13:  8.9760   0.000       5            2
14:  9.8640   0.000       6            2
15:  7.3000   9.200       0            2
16:  2.3000   3.200       0            2
17:  4.3000   0.000       0            3
18:  2.1000   0.000       0            3
19:  4.3200   0.000       7            3

Used data:

library(data.table)
dt <- fread("Column1 Column2
            1.1254  2.784
            4.678   7.985
            8.89      0
            7.65      0
            3.54      0
            4.32      0
            9.83      0
            3.86     4.3
            5.63     9.8
            4.53      0
            6.83      0
            3.431     0
            8.976     0
            9.864     0
            7.3      9.2
            2.3      3.2
            4.3       0
            2.1       0
            4.32      0")


标签: r row counter