Special grouping number for each pairs

There is already some part of the question answered here special-group-number-for-each-combination-of-data. In most cases we have pairs and other data values inside the data. What we want to achieve is that number those groups if those pairs exist and number them until the next pairs.

As I concentrated each pairs such as c("bad","good") would like to group them and for pairs c('Veni',"vidi","Vici") assign unique number 666.

Here is the example data

names <- c(c("bad","good"),1,2,c("good","bad"),111,c("bad","J.James"),c("good","J.James"),333,c("J.James","good"),761,'Veni',"vidi","Vici")

  df <- data.frame(names)

Here is the real and general case expected output

     names  Group
1      bad    1
2     good    1
3        1    1
4        2    1
5     good    2
6      bad    2
7      111    2
8      bad    3
9  J.James    3
10    good    4
11 J.James    4
12     333    4
13 J.James    5
14    good    5
15     761    5
16    Veni    666
17    vidi    666
18    Vici    666

标签： r dplyr aggregate

1条回答

一纸荒年 Trace。

2楼-- · 2019-03-03 08:21

Here are two approaches which reproduce OP's expected result for the given sample dataset.`

Both work in the same way. First, all "disturbing" rows, i.e., rows which do not contain "valid" names, are skipped and the rows with "valid" names are simply numbered in groups of 2. Second, the rows with exempt names are given the special group number. Finally, the NA rows are filled by carrying the last observation forward.

`data.table`

library(data.table)
names <- c(c("bad","good"),1,2,c("good","bad"),111,c("bad","J.James"),c("good","J.James"),333,c("J.James","good"),761,'Veni',"vidi","Vici")
exempt <- c("Veni", "vidi", "Vici")
data.table(names)[is.na(as.numeric(names)) & !names %in% exempt, 
                  grp := rep(1:.N, each = 2L, length.out = .N)][
                    names %in% exempt, grp := 666L][
                      , grp := zoo::na.locf(grp)][]

      names grp
 1:     bad   1
 2:    good   1
 3:       1   1
 4:       2   1
 5:    good   2
 6:     bad   2
 7:     111   2
 8:     bad   3
 9: J.James   3
10:    good   4
11: J.James   4
12:     333   4
13: J.James   5
14:    good   5
15:     761   5
16:    Veni 666
17:    vidi 666
18:    Vici 666

`dplyr`/`tidyr`

Here is my attempt to provide a dplyr/tidyr solution:

library(dplyr)
as_tibble(names) %>% 
  mutate(grp = if_else(is.na(as.numeric(names)) & !names %in% exempt,  
                       rep(1:n(), each = 2L, length.out = n()),
                       if_else(names %in% exempt, 666L, NA_integer_))) %>% 
  tidyr::fill(grp)

# A tibble: 18 x 2
   value     grp
   <chr>   <int>
 1 bad         1
 2 good        1
 3 1           1
 4 2           1
 5 good        3
 6 bad         3
 7 111         3
 8 bad         4
 9 J.James     5
10 good        5
11 J.James     6
12 333         6
13 J.James     7
14 good        7
15 761         7
16 Veni      666
17 vidi      666
18 Vici      666

0人赞添加讨论(0) 举报