Finding local maxima and minima

2019-01-01 02:35发布

问题:

I\'m looking for a computationally efficient way to find local maxima/minima for a large list of numbers in R. Hopefully without for loops...

For example, if I have a datafile like 1 2 3 2 1 1 2 1, I want the function to return 3 and 7, which are the positions of the local maxima.

回答1:

diff(diff(x)) (or diff(x,differences=2): thanks to @ZheyuanLi) essentially computes the discrete analogue of the second derivative, so should be negative at local maxima. The +1 below takes care of the fact that the result of diff is shorter than the input vector.

edit: added @Tommy\'s correction for cases where delta-x is not 1...

tt <- c(1,2,3,2,1, 1, 2, 1)
which(diff(sign(diff(tt)))==-2)+1

My suggestion above ( http://statweb.stanford.edu/~tibs/PPC/Rdist/ ) is intended for the case where the data are noisier.



回答2:

@Ben\'s solution is pretty sweet. It doesn\'t handle the follwing cases though:

# all these return numeric(0):
x <- c(1,2,9,9,2,1,1,5,5,1) # duplicated points at maxima 
which(diff(sign(diff(x)))==-2)+1 
x <- c(2,2,9,9,2,1,1,5,5,1) # duplicated points at start
which(diff(sign(diff(x)))==-2)+1 
x <- c(3,2,9,9,2,1,1,5,5,1) # start is maxima
which(diff(sign(diff(x)))==-2)+1

Here\'s a more robust (and slower, uglier) version:

localMaxima <- function(x) {
  # Use -Inf instead if x is numeric (non-integer)
  y <- diff(c(-.Machine$integer.max, x)) > 0L
  rle(y)$lengths
  y <- cumsum(rle(y)$lengths)
  y <- y[seq.int(1L, length(y), 2L)]
  if (x[[1]] == x[[2]]) {
    y <- y[-1]
  }
  y
}

x <- c(1,2,9,9,2,1,1,5,5,1)
localMaxima(x) # 3, 8
x <- c(2,2,9,9,2,1,1,5,5,1)
localMaxima(x) # 3, 8
x <- c(3,2,9,9,2,1,1,5,5,1)
localMaxima(x) # 1, 3, 8


回答3:

Use the zoo library function rollapply:

x <- c(1, 2, 3, 2, 1, 1, 2, 1)
library(zoo)
 xz <- as.zoo(x)
 rollapply(xz, 3, function(x) which.min(x)==2)
#    2     3     4     5     6     7 
#FALSE FALSE FALSE  TRUE FALSE FALSE 
 rollapply(xz, 3, function(x) which.max(x)==2)
#    2     3     4     5     6     7 
#FALSE  TRUE FALSE FALSE FALSE  TRUE 

Then pull the index using the \'coredata\' for those values where \'which.max\' is a \"center value\" signaling a local maximum. You could obviously do the same for local minima using which.min instead of which.max.

 rxz <- rollapply(xz, 3, function(x) which.max(x)==2)
 index(rxz)[coredata(rxz)]
#[1] 3 7

I am assuming you do not want the starting or ending values, but if you do , you could pad the ends of your vectors before processing, rather like telomeres do on chromosomes.

(I\'m noting the ppc package (\"Peak Probability Contrasts\" for doing mass spectrometry analyses, simply because I was unaware of its availability until reading @BenBolker\'s comment above, and I think adding these few words will increase the chances that someone with a mass-spec interest will see this on a search.)



回答4:

I took a stab at this today. I know you said hopefully without for loops but I stuck with using the apply function. Somewhat compact and fast and allows threshold specification so you can go greater than 1.

The function:

inflect <- function(x, threshold = 1){
  up   <- sapply(1:threshold, function(n) c(x[-(seq(n))], rep(NA, n)))
  down <-  sapply(-1:-threshold, function(n) c(rep(NA,abs(n)), x[-seq(length(x), length(x) - abs(n) + 1)]))
  a    <- cbind(x,up,down)
  list(minima = which(apply(a, 1, min) == a[,1]), maxima = which(apply(a, 1, max) == a[,1]))
}

To a visualize it/play with thresholds you can run the following code:

# Pick a desired threshold # to plot up to
n <- 2
# Generate Data
randomwalk <- 100 + cumsum(rnorm(50, 0.2, 1)) # climbs upwards most of the time
bottoms <- lapply(1:n, function(x) inflect(randomwalk, threshold = x)$minima)
tops <- lapply(1:n, function(x) inflect(randomwalk, threshold = x)$maxima)
# Color functions
cf.1 <- grDevices::colorRampPalette(c(\"pink\",\"red\"))
cf.2 <- grDevices::colorRampPalette(c(\"cyan\",\"blue\"))
plot(randomwalk, type = \'l\', main = \"Minima & Maxima\\nVariable Thresholds\")
for(i in 1:n){
  points(bottoms[[i]], randomwalk[bottoms[[i]]], pch = 16, col = cf.1(n)[i], cex = i/1.5)
}
for(i in 1:n){
  points(tops[[i]], randomwalk[tops[[i]]], pch = 16, col = cf.2(n)[i], cex = i/1.5)
}
legend(\"topleft\", legend = c(\"Minima\",1:n,\"Maxima\",1:n), 
       pch = rep(c(NA, rep(16,n)), 2), col = c(1, cf.1(n),1, cf.2(n)), 
       pt.cex =  c(rep(c(1, c(1:n) / 1.5), 2)), cex = .75, ncol = 2)

\"enter



回答5:

There are some good solutions provided, but it depends on what you need.

Just diff(tt) returns the differences.

You want to detect when you go from increasing values to decreasing values. One way to do this is provided by @Ben:

 diff(sign(diff(tt)))==-2

The problem here is that this will only detect changes that go immediately from strictly increasing to strictly decreasing.

A slight change will allow for repeated values at the peak (returning TRUE for last occurence of the peak value):

 diff(diff(x)>=0)<0

Then, you simply need to properly pad the front and back if you want to detect maxima at the beginning or end of

Here\'s everything wrapped in a function (including finding of valleys):

 which.peaks <- function(x,partial=TRUE,decreasing=FALSE){
     if (decreasing){
         if (partial){
             which(diff(c(FALSE,diff(x)>0,TRUE))>0)
         }else {
             which(diff(diff(x)>0)>0)+1
         }
     }else {
         if (partial){
             which(diff(c(TRUE,diff(x)>=0,FALSE))<0)
         }else {
             which(diff(diff(x)>=0)<0)+1
         }
     }
 }


回答6:

Answer by @42- is great, but I had a use case where I didn\'t want to use zoo. It\'s easy to implement this with dplyr using lag and lead:

library(dplyr)
test = data_frame(x = sample(1:10, 20, replace = TRUE))
mutate(test, local.minima = if_else(lag(x) > x & lead(x) > x, TRUE, FALSE)

Like the rollapply solution, you can control the window size and edge cases through the lag/lead arguments n and default, respectively.



回答7:

Here\'s the solution for minima:

@Ben\'s solution

x <- c(1,2,3,2,1,2,1)
which(diff(sign(diff(x)))==+2)+1 # 5

Please regard the cases at Tommy\'s post!

@Tommy\'s solution:

localMinima <- function(x) {
  # Use -Inf instead if x is numeric (non-integer)
  y <- diff(c(.Machine$integer.max, x)) > 0L
  rle(y)$lengths
  y <- cumsum(rle(y)$lengths)
  y <- y[seq.int(1L, length(y), 2L)]
  if (x[[1]] == x[[2]]) {
    y <- y[-1]
  }
  y
}

x <- c(1,2,9,9,2,1,1,5,5,1)
localMinima(x) # 1, 7, 10
x <- c(2,2,9,9,2,1,1,5,5,1)
localMinima(x) # 7, 10
x <- c(3,2,9,9,2,1,1,5,5,1)
localMinima(x) # 2, 7, 10

Please regard: Neither localMaxima nor localMinima can handle duplicated maxima/minima at start!



回答8:

I had some trouble getting the locations to work in previous solutions and came up with a way to grab the minima and maxima directly. The code below will do this and will plot it, marking the minima in green and the maxima in red. Unlike the which.max() function this will pull all indices of the minima/maxima out of a data frame. The zero value is added in the first diff() function to account for the missing decreased length of the result that occurs whenever you use the function. Inserting this into the innermost diff() function call saves from having to add an offset outside of the logical expression. It doesn\'t matter much, but i feel it\'s a cleaner way to do it.

# create example data called stockData
stockData = data.frame(x = 1:30, y=rnorm(30,7))

# get the location of the minima/maxima. note the added zero offsets  
# the location to get the correct indices
min_indexes = which(diff(  sign(diff( c(0,stockData$y)))) == 2)
max_indexes = which(diff(  sign(diff( c(0,stockData$y)))) == -2)

# get the actual values where the minima/maxima are located
min_locs = stockData[min_indexes,]
max_locs = stockData[max_indexes,]

# plot the data and mark minima with red and maxima with green
plot(stockData$y, type=\"l\")
points( min_locs, col=\"red\", pch=19, cex=1  )
points( max_locs, col=\"green\", pch=19, cex=1  )


回答9:

I posted this elsewhere, but I think this is an interesting way to go about it. I\'m not sure what its computational efficiency is, but it\'s a very concise way of solving the problem.

vals=rbinom(1000,20,0.5)

text=paste0(substr(format(diff(vals),scientific=TRUE),1,1),collapse=\"\")

sort(na.omit(c(gregexpr(\'[ ]-\',text)[[1]]+1,ifelse(grepl(\'^-\',text),1,NA),
 ifelse(grepl(\'[^-]$\',text),length(vals),NA))))


回答10:

In the pracma package, use the

tt <- c(1,2,3,2,1, 1, 2, 1)
tt_peaks <- findpeaks(tt, zero = \"0\", peakpat = NULL,
       minpeakheight = -Inf, minpeakdistance = 1, threshold = 0, npeaks = 0, sortstr = FALSE)

  [,1] [,2] [,3] [,4]
  [1,]  3    3    1    5
  [2,]  2    7    6    8

That returns a matrix with 4 columns. The first column is showing the local peaks\' absolute values. The 2nd column are the indices The 3rd and 4th column are the start and end of the peaks (with potential overlap).

See https://www.rdocumentation.org/packages/pracma/versions/1.9.9/topics/findpeaks for details.

One caveat: I used it in a series of non-integers, and the peak was one index too late (for all peaks) and I do not know why. So I had to manually remove \"1\" from my index vector (no big deal).



标签: r