Performing a phase correlation with fft in R

2019-04-01 05:31发布

问题:

I am trying to implement a 2d phase correlation algorithm in R using a recipe from Wikipedia (http://en.wikipedia.org/wiki/Phase_correlation) in order to track the movement between 2 images. These images (frames) were captured with a camera shaking in the wind and the ultimate goal is to remove the shake in these and subsequent frames. The two example images and the R code are below:

## we will need the tiff library 
library(tiff)

## read in the tiff files 
f1=as.matrix(readTIFF('f1.tiff',native=TRUE))
f2=as.matrix(readTIFF('f2.tiff',native=TRUE))

## take the fft of the first  frame
F1 <- fft(f1)
## take the Conjugate fft of the second frame
F2.c <- Conj(fft(f2))

## calculate the cross power spectrum according to the wiki article
R <- (F1*F2.c)/abs(F1*F2.c)
## take the inverse fft of R
r <- fft(R,inv=TRUE)/length(R)
## because the zero valued imaginary numbers are not needed
r <- Re(r)

## show the normalized cross-correlation
image(r)

## find the max in the cross correlation matrix, or the phase shift -
## between the two images
shift <- which(r==max(r),arr.ind=TRUE)

The vector shift, to my understanding, should contain information on the transitive shift (dx and dy) that best corrects these two images. However the shift variable gives dx=1 and dy=1, which I assume indicates no shift in either the x or y direction. This occurs for subsequent frames where there are visible shifts or several pixels in both the x and y direction.

Do any of y'all see an error in my code/formulas? Or do I need to try something fancier like filtering the images first before I do a phase correlation?

Cheers gals and guys!

回答1:

The code looks correct from what I know about phase correlation. If I understand what you want correctly, you are trying to use phase correlation to determine the offset between two images given that their homographies are nothing more than horizontal and vertical offsets. The fact that you're only getting the shift to be at the origin is most likely due to your images lacking sufficient high frequency information in order to properly determine a good shift.

Try these two images instead (these were from the Wikipedia article you referenced, but I extracted them out and saved them as individual images):

When I run these two images with your R code, I get this for my phase correlation map. Bear in mind that your images were actually saved as .png, so I had to change the library to library(png) and I used readPNG instead of readTIFF. Keep that in mind when you try and run your code with the above example images:

Also, the location of where the maximum peak occurred was:

> shift
     row col
[1,] 132 153

This tells us that the image shifted over by 132 rows and 153 columns. Take note that this is with respect to the centre of the image. If you want to determine the actual offset, you'll need to subtract this by half the rows for the vertical coordinate and half the columns for the horizontal coordinate.

Therefore, the code works totally fine... it's just that your images lack sufficient high frequency information for the phase correlation to work. What correlation is trying to do in this case is that we're trying to find "similar" variations between each image. If there are a lot of variations between each image and are very similar, then phase correlation will work well. However, if we don't have that much variation, then phase correlation won't work.

Why is that the case? The basis behind phase correlation is that we assume that the image is corrupted with Gaussian white noise, and so if we correlate white noise with itself (from one image to another) it will give a very nice high peak at where the offset or the shift is and almost zero everywhere. Because of the fact that your images lack a lot of high frequency information and the fact that the images are clean, then phase correlation actually won't work. Therefore, what some people actually suggest is to pre-whiten your image so that the image contains white noise so that you can get the nice peak at where the offset should be that we're talking about.

However, just to make sure that you eliminate any false maximums, it is a good idea to also smoothen the cross-correlation matrix in the frequency domain (r in your R code) so that there is a high probability that there will only be one true maximum. Using a Gaussian filter in the frequency / FFT domain should work fine.

In any case, I don't see much variation in your images and so something to take away from this is that you gotta make sure your image has a lot of high frequency information for this to work!



回答2:

Below is qualitative description of the routine followed by the R code used to efficiently and robustly find the phase correlation between the two images posted in the original question. Thanks to @rayreng for the advice and pointing me in the right direction!

  1. Read in both images
  2. Add noise to the second image
  3. Transform both images into the frequency spectrum using fft
  4. convolve transformed images using a multiplication
  5. return to the spatial domain with a inverse fft. This is your normalized cross correlation
  6. Rearrange the normalized cross correlation matrix so that the zero frequency is in the middle of the matrix (similar to fftshift in matlab)
  7. Use a 2d Gaussian distribution to smooth the normalized cross correlated matrix
  8. Determine the shift by identifying the maximum indexed value of the smoothed normalized correlated matrix
  9. Be aware that this function also uses a custom 2d Gaussian generator (see below) and a custom function similar to matlabs fftshift.

     ### R CODE ###
     rm(list=ls())
     library(tiff)
     ## read in the tiff images 
     f1 <- readTIFF('f1.tiff',native=TRUE)
     f1 <- matrix(f1,ncol=ncol(f1),nrow=nrow(f1)) 
    
    
     ## take the fft of f1 
     F1 <- fft(f1)
    
     ## what is the subsequent frame?
     f2 <-readTIFF('f2.tiff',native=TRUE)
     f2 <- matrix(f2,ncol=ncol(f2),nrow=nrow(f2))
    
     ## create a vector of random noise the same length as f2
     noise.b <- runif(length(f2),min(range(f2)),max(range(f2)))
     ## add the noise to the f2
     f2 <- noise.b+f2   
    
    ## take the fft and conjugate of the f2
    F2.c <- Conj(fft(f2))
    
    ## calculate the cross-power spectrum
    R <- (F1*F2.c)/abs(F1*F2.c)
    ## calculate the normalized cross-correlation with fft inverse
    r <- fft(R,inv=TRUE)/length(R)
    ## rearrange the r matrix so that zero frequency component is in the -
    ## middle of the matrix.  This is similar to the function - 
    ## fftshift in matlab 
    
    fftshift <- function(x) {
    if(class(x)=='matrix') {
        rd2 <- floor(nrow(x)/2)
        cd2 <- floor(ncol(x)/2)
    
        ## Identify the first, second, third, and fourth quadrants 
        q1 <- x[1:rd2,1:cd2]
        q2 <- x[1:rd2,(cd2+1):ncol(x)]
        q3 <- x[(rd2+1):nrow(x),(cd2+1):ncol(x)]
        q4 <- x[(rd2+1):nrow(x),1:cd2]
    
        ## rearrange the quadrants 
        centered.t <- rbind(q4,q1)
        centered.b <- rbind(q3,q2)
        centered <- cbind(centered.b,centered.t)
    
        return(Re(centered))             
    }
    if(class(x)!='matrix') {
        print('sorry, this class of input x is not supported yet')
        }
    }
    
    ## now use the defined function fftshift on the matrix r
    r <- fftshift(r)
    r <- Re(r)
    
    ## try and smooth the matrix r to find the peak!
    ## first build a 2d gaussian matrix  
    sig = 5 ## define a sigma 
    
    ## determine the rounded half dimensions of the r matrix 
    x.half.dim <- floor(ncol(r)/2)
    y.half.dim <- floor(nrow(r)/2)
    
    ## create x and y vectors that correspond to the indexed row
    ## and column values of the r matrix.  
    xs <- rep(-x.half.dim:x.half.dim,ncol(r))
    ys <- rep(-y.half.dim:y.half.dim,each=nrow(r))
    
    ## apply the gaussian blur formula 
    ## (http://en.wikipedia.org/wiki/Gaussian_blur)
    ## to every x and y indexed value
    gaus <- 1/(2*pi*sig^2) * exp(-(xs^2 + ys^2)/(2*sig^2))
    gaus <- matrix(gaus,ncol=ncol(r),nrow=nrow(r),byrow=FALSE)
    
    ## now convolve the gaus with r in the frequency domain
    r.filt <- fft((fft(r)*fft(gaus)),inv=TRUE)/length(r)
    r.filt <- fftshift(Re(r.filt))
    
    ## find dx and dy with the peak in r    
    min.err <- which(r.filt==max(r.filt),arr.ind=TRUE)
    
    ## how did the image move from the previous? 
    shift <- (dim(f1)+3)/2-min.err
    

The vector shift indicates that the image was shifted in the x positive direction and in negative y direction. In other words the second image (f2) was moved roughly to the upper right. The values of the vector shift will vary slightly with each attempt due to the noise introduced into the second image along with the smoothing operator from the Gaussian filter on the r matrix. I have noticed that a phase correlation similar to the one outlined above works better on larger images/matrices. To see the results of the above algorithm, visit the stabilized video at https://www.youtube.com/watch?v=irDFk2kbKaE.