I have some data in a list that I need to look for continuous runs of integers (My brain thinkrle
but don't know how to use it here).
It's easier to look at the data set and explain what I'm after.
Here's the data view:
$greg
[1] 7 8 9 10 11 20 21 22 23 24 30 31 32 33 49
$researcher
[1] 42 43 44 45 46 47 48
$sally
[1] 25 26 27 28 29 37 38 39 40 41
$sam
[1] 1 2 3 4 5 6 16 17 18 19 34 35 36
$teacher
[1] 12 13 14 15
Desired output:
$greg
[1] 7:11, 20:24, 30:33, 49
$researcher
[1] 42:48
$sally
[1] 25:29, 37:41
$sam
[1] 1:6, 16:19 34:36
$teacher
[1] 12:15
Use base packages how can I replace continuous span with a colon between highest and lowest and commas in between non the non continuous parts? Note that the data goes from a list of integer vectors to a list of character vectors.
MWE data:
z <- structure(list(greg = c(7L, 8L, 9L, 10L, 11L, 20L, 21L, 22L,
23L, 24L, 30L, 31L, 32L, 33L, 49L), researcher = 42:48, sally = c(25L,
26L, 27L, 28L, 29L, 37L, 38L, 39L, 40L, 41L), sam = c(1L, 2L,
3L, 4L, 5L, 6L, 16L, 17L, 18L, 19L, 34L, 35L, 36L), teacher = 12:15), .Names = c("greg",
"researcher", "sally", "sam", "teacher"))
I think diff
is the solution. You might need some additional fiddling to deal with the singletons, but:
lapply(z, function(x) {
diffs <- c(1, diff(x))
start_indexes <- c(1, which(diffs > 1))
end_indexes <- c(start_indexes - 1, length(x))
coloned <- paste(x[start_indexes], x[end_indexes], sep=":")
paste0(coloned, collapse=", ")
})
$greg
[1] "7:11, 20:24, 30:33, 49:49"
$researcher
[1] "42:48"
$sally
[1] "25:29, 37:41"
$sam
[1] "1:6, 16:19, 34:36"
$teacher
[1] "12:15"
Using IRanges
:
require(IRanges)
lapply(z, function(x) {
t <- as.data.frame(reduce(IRanges(x,x)))[,1:2]
apply(t, 1, function(x) paste(unique(x), collapse=":"))
})
# $greg
# [1] "7:11" "20:24" "30:33" "49"
#
# $researcher
# [1] "42:48"
#
# $sally
# [1] "25:29" "37:41"
#
# $sam
# [1] "1:6" "16:19" "34:36"
#
# $teacher
# [1] "12:15"
Here is an attempt using diff
and tapply
returning a character vector
runs <- lapply(z, function(x) {
z <- which(diff(x)!=1);
results <- x[sort(unique(c(1,length(x), z,z+1)))]
lr <- length(results)
collapse <- rep(seq_len(ceiling(lr/2)),each = 2, length.out = lr)
as.vector(tapply(results, collapse, paste, collapse = ':'))
})
runs
$greg
[1] "7:11" "20:24" "30:33" "49"
$researcher
[1] "42:48"
$sally
[1] "25:29" "37:41"
$sam
[1] "1:6" "16:19" "34:36"
$teacher
[1] "12:15"
I have a fairly similar solution to Marius, his works as well as mine but the mechanisms are slightly different so I thought I may as well post it:
findIntRuns <- function(run){
rundiff <- c(1, diff(run))
difflist <- split(run, cumsum(rundiff!=1))
unname(sapply(difflist, function(x){
if(length(x) == 1) as.character(x) else paste0(x[1], ":", x[length(x)])
}))
}
lapply(z, findIntRuns)
Which produces:
$greg
[1] "7:11" "20:24" "30:33" "49"
$researcher
[1] "42:48"
$sally
[1] "25:29" "37:41"
$sam
[1] "1:6" "16:19" "34:36"
$teacher
[1] "12:15"
Another short solution with lapply
and tapply
:
lapply(z, function(x)
unname(tapply(x, c(0, cumsum(diff(x) != 1)), FUN = function(y)
paste(unique(range(y)), collapse = ":")
))
)
The result:
$greg
[1] "7:11" "20:24" "30:33" "49"
$researcher
[1] "42:48"
$sally
[1] "25:29" "37:41"
$sam
[1] "1:6" "16:19" "34:36"
$teacher
[1] "12:15"
Late to the party, but here's a deparse
based one-liner:
lapply(z,function(x) paste(sapply(split(x,cumsum(c(1,diff(x)-1))),deparse),collapse=", "))
$greg
[1] "7:11, 20:24, 30:33, 49L"
$researcher
[1] "42:48"
$sally
[1] "25:29, 37:41"
$sam
[1] "1:6, 16:19, 34:36"
$teacher
[1] "12:15"