Continuous integer runs

2019-01-15 12:54发布

问题:

I have some data in a list that I need to look for continuous runs of integers (My brain thinkrle but don't know how to use it here).

It's easier to look at the data set and explain what I'm after.

Here's the data view:

$greg
 [1]  7  8  9 10 11 20 21 22 23 24 30 31 32 33 49

$researcher
[1] 42 43 44 45 46 47 48

$sally
 [1] 25 26 27 28 29 37 38 39 40 41

$sam
 [1]  1  2  3  4  5  6 16 17 18 19 34 35 36

$teacher
[1] 12 13 14 15

Desired output:

$greg
 [1]  7:11, 20:24, 30:33, 49

$researcher
 [1] 42:48

$sally
 [1] 25:29, 37:41

$sam
 [1]  1:6, 16:19 34:36

$teacher
 [1] 12:15

Use base packages how can I replace continuous span with a colon between highest and lowest and commas in between non the non continuous parts? Note that the data goes from a list of integer vectors to a list of character vectors.

MWE data:

z <- structure(list(greg = c(7L, 8L, 9L, 10L, 11L, 20L, 21L, 22L, 
    23L, 24L, 30L, 31L, 32L, 33L, 49L), researcher = 42:48, sally = c(25L, 
    26L, 27L, 28L, 29L, 37L, 38L, 39L, 40L, 41L), sam = c(1L, 2L, 
    3L, 4L, 5L, 6L, 16L, 17L, 18L, 19L, 34L, 35L, 36L), teacher = 12:15), .Names = c("greg", 
    "researcher", "sally", "sam", "teacher"))

回答1:

I think diff is the solution. You might need some additional fiddling to deal with the singletons, but:

lapply(z, function(x) {
  diffs <- c(1, diff(x))
  start_indexes <- c(1, which(diffs > 1))
  end_indexes <- c(start_indexes - 1, length(x))
  coloned <- paste(x[start_indexes], x[end_indexes], sep=":")
  paste0(coloned, collapse=", ")
})

$greg
[1] "7:11, 20:24, 30:33, 49:49"

$researcher
[1] "42:48"

$sally
[1] "25:29, 37:41"

$sam
[1] "1:6, 16:19, 34:36"

$teacher
[1] "12:15"


回答2:

Using IRanges:

require(IRanges)
lapply(z, function(x) {
    t <- as.data.frame(reduce(IRanges(x,x)))[,1:2]
    apply(t, 1, function(x) paste(unique(x), collapse=":"))
})

# $greg
# [1] "7:11"  "20:24" "30:33" "49"   
# 
# $researcher
# [1] "42:48"
# 
# $sally
# [1] "25:29" "37:41"
# 
# $sam
# [1] "1:6"   "16:19" "34:36"
# 
# $teacher
# [1] "12:15"


回答3:

Here is an attempt using diff and tapply returning a character vector

runs <- lapply(z, function(x) {
  z <- which(diff(x)!=1); 
  results <- x[sort(unique(c(1,length(x), z,z+1)))]
  lr <- length(results)
  collapse <- rep(seq_len(ceiling(lr/2)),each = 2, length.out = lr)
  as.vector(tapply(results, collapse, paste, collapse = ':'))
  })

runs
$greg
[1] "7:11"  "20:24" "30:33" "49"   

$researcher
[1] "42:48"

$sally
[1] "25:29" "37:41"

$sam
[1] "1:6"   "16:19" "34:36"

$teacher
[1] "12:15"


回答4:

I have a fairly similar solution to Marius, his works as well as mine but the mechanisms are slightly different so I thought I may as well post it:

findIntRuns <- function(run){
  rundiff <- c(1, diff(run))
  difflist <- split(run, cumsum(rundiff!=1))
  unname(sapply(difflist, function(x){
    if(length(x) == 1) as.character(x) else paste0(x[1], ":", x[length(x)])
  }))
}

lapply(z, findIntRuns)

Which produces:

$greg
[1] "7:11"  "20:24" "30:33" "49"   

$researcher
[1] "42:48"

$sally
[1] "25:29" "37:41"

$sam
[1] "1:6"   "16:19" "34:36"

$teacher
[1] "12:15"


回答5:

Another short solution with lapply and tapply:

lapply(z, function(x)
  unname(tapply(x, c(0, cumsum(diff(x) != 1)), FUN = function(y) 
    paste(unique(range(y)), collapse = ":")
  ))
)

The result:

$greg
[1] "7:11"  "20:24" "30:33" "49"   

$researcher
[1] "42:48"

$sally
[1] "25:29" "37:41"

$sam
[1] "1:6"   "16:19" "34:36"

$teacher
[1] "12:15"


回答6:

Late to the party, but here's a deparse based one-liner:

lapply(z,function(x) paste(sapply(split(x,cumsum(c(1,diff(x)-1))),deparse),collapse=", "))
$greg
[1] "7:11, 20:24, 30:33, 49L"

$researcher
[1] "42:48"

$sally
[1] "25:29, 37:41"

$sam
[1] "1:6, 16:19, 34:36"

$teacher
[1] "12:15"