Continuous integer runs

2019-01-15 12:26发布

I have some data in a list that I need to look for continuous runs of integers (My brain thinkrle but don't know how to use it here).

It's easier to look at the data set and explain what I'm after.

Here's the data view:

$greg
 [1]  7  8  9 10 11 20 21 22 23 24 30 31 32 33 49

$researcher
[1] 42 43 44 45 46 47 48

$sally
 [1] 25 26 27 28 29 37 38 39 40 41

$sam
 [1]  1  2  3  4  5  6 16 17 18 19 34 35 36

$teacher
[1] 12 13 14 15

Desired output:

$greg
 [1]  7:11, 20:24, 30:33, 49

$researcher
 [1] 42:48

$sally
 [1] 25:29, 37:41

$sam
 [1]  1:6, 16:19 34:36

$teacher
 [1] 12:15

Use base packages how can I replace continuous span with a colon between highest and lowest and commas in between non the non continuous parts? Note that the data goes from a list of integer vectors to a list of character vectors.

MWE data:

z <- structure(list(greg = c(7L, 8L, 9L, 10L, 11L, 20L, 21L, 22L, 
    23L, 24L, 30L, 31L, 32L, 33L, 49L), researcher = 42:48, sally = c(25L, 
    26L, 27L, 28L, 29L, 37L, 38L, 39L, 40L, 41L), sam = c(1L, 2L, 
    3L, 4L, 5L, 6L, 16L, 17L, 18L, 19L, 34L, 35L, 36L), teacher = 12:15), .Names = c("greg", 
    "researcher", "sally", "sam", "teacher"))

6条回答
萌系小妹纸
2楼-- · 2019-01-15 12:44

Late to the party, but here's a deparse based one-liner:

lapply(z,function(x) paste(sapply(split(x,cumsum(c(1,diff(x)-1))),deparse),collapse=", "))
$greg
[1] "7:11, 20:24, 30:33, 49L"

$researcher
[1] "42:48"

$sally
[1] "25:29, 37:41"

$sam
[1] "1:6, 16:19, 34:36"

$teacher
[1] "12:15"
查看更多
做自己的国王
3楼-- · 2019-01-15 12:49

Another short solution with lapply and tapply:

lapply(z, function(x)
  unname(tapply(x, c(0, cumsum(diff(x) != 1)), FUN = function(y) 
    paste(unique(range(y)), collapse = ":")
  ))
)

The result:

$greg
[1] "7:11"  "20:24" "30:33" "49"   

$researcher
[1] "42:48"

$sally
[1] "25:29" "37:41"

$sam
[1] "1:6"   "16:19" "34:36"

$teacher
[1] "12:15"
查看更多
我命由我不由天
4楼-- · 2019-01-15 12:52

I think diff is the solution. You might need some additional fiddling to deal with the singletons, but:

lapply(z, function(x) {
  diffs <- c(1, diff(x))
  start_indexes <- c(1, which(diffs > 1))
  end_indexes <- c(start_indexes - 1, length(x))
  coloned <- paste(x[start_indexes], x[end_indexes], sep=":")
  paste0(coloned, collapse=", ")
})

$greg
[1] "7:11, 20:24, 30:33, 49:49"

$researcher
[1] "42:48"

$sally
[1] "25:29, 37:41"

$sam
[1] "1:6, 16:19, 34:36"

$teacher
[1] "12:15"
查看更多
地球回转人心会变
5楼-- · 2019-01-15 12:58

I have a fairly similar solution to Marius, his works as well as mine but the mechanisms are slightly different so I thought I may as well post it:

findIntRuns <- function(run){
  rundiff <- c(1, diff(run))
  difflist <- split(run, cumsum(rundiff!=1))
  unname(sapply(difflist, function(x){
    if(length(x) == 1) as.character(x) else paste0(x[1], ":", x[length(x)])
  }))
}

lapply(z, findIntRuns)

Which produces:

$greg
[1] "7:11"  "20:24" "30:33" "49"   

$researcher
[1] "42:48"

$sally
[1] "25:29" "37:41"

$sam
[1] "1:6"   "16:19" "34:36"

$teacher
[1] "12:15"
查看更多
老娘就宠你
6楼-- · 2019-01-15 12:59

Using IRanges:

require(IRanges)
lapply(z, function(x) {
    t <- as.data.frame(reduce(IRanges(x,x)))[,1:2]
    apply(t, 1, function(x) paste(unique(x), collapse=":"))
})

# $greg
# [1] "7:11"  "20:24" "30:33" "49"   
# 
# $researcher
# [1] "42:48"
# 
# $sally
# [1] "25:29" "37:41"
# 
# $sam
# [1] "1:6"   "16:19" "34:36"
# 
# $teacher
# [1] "12:15"
查看更多
甜甜的少女心
7楼-- · 2019-01-15 13:01

Here is an attempt using diff and tapply returning a character vector

runs <- lapply(z, function(x) {
  z <- which(diff(x)!=1); 
  results <- x[sort(unique(c(1,length(x), z,z+1)))]
  lr <- length(results)
  collapse <- rep(seq_len(ceiling(lr/2)),each = 2, length.out = lr)
  as.vector(tapply(results, collapse, paste, collapse = ':'))
  })

runs
$greg
[1] "7:11"  "20:24" "30:33" "49"   

$researcher
[1] "42:48"

$sally
[1] "25:29" "37:41"

$sam
[1] "1:6"   "16:19" "34:36"

$teacher
[1] "12:15"
查看更多
登录 后发表回答