Converting NMR ascii file to peak list

2019-06-14 17:17发布

I have some Bruker NMR spectra that i am using to create a program as part of a project. My program needs to work on the actual spectrum. So i converted the 1r files of the Bruker NMR spectra to ASCII. For Carnitine this is what the ascii file looks like(this is not the complete list. The complete list runs into thousands of lines. This is only a snapshot):

-0.807434   -23644  
-0.807067   -22980  
-0.806701   -22967  
-0.806334   -24513  
-0.805967   -27609  
-0.805601   -31145  
-0.805234   -33951  
-0.804867   -35553  
-0.804501   -35880  
-0.804134   -35240  
-0.803767   -34626  
-0.8034  -34613 
-0.803034   -34312  
-0.802667   -32411  
-0.8023  -28925 
-0.801934   -25177  
-0.801567   -22132  
-0.8012  -19395 

and this is what the spectrum is: alt text http://www.bmrb.wisc.edu/metabolomics/standards/L_carnitine/nmr/bmse000211/spectra_png/1H.png

My program has to identify the peaks from this data. So i need to know how to interpret these numbers. And how exactly they are converted into their appropriate values in the spectrum. So far this is what i have learnt:

1.) The first column represents the spectral point position (ppm)

2.) The second column represents the intensity of each peak.

3.) notice that in the second column there are some numbers which are not perfectly aligned but are closer to the first column. Eg:-34613, -28925, -19395. I think this is significant.

For the sake of full disclosure- I am doing the programming in R.

NOTE: I have also asked this in Biostar but i think i have a better chance of getting an answer here than there because not many people seem to be answering questions there.

EDIT: This is one solution that i have found is plausible:

A friend gave me the idea to use an awk script to check where exactly the intensities in the file change from positive to negative to find the local maxima. Here is a working script:

awk 'BEGIN{dydx = 0;}
{ 
  if(NR > 1)
     { dydx = ($2 - y0)/($1 - x0); } 
  if(NR > 2 && last * dydx < 0)
     { printf( "%.4f  %.4f\n", (x0 + $1)/2, log((dydx<0)?-dydx:dydx)); } ;
  last=dydx; x0=$1; y0=$2
}' /home/chaitanya/Work/nmr_spectra/caffeine/pdata/1/spectrumtext.txt  | awk '$2 > 17'

Tell me if you dont understand it. I will improve the explanation.

Also, there is this related question i asked.

2条回答
何必那么认真
2楼-- · 2019-06-14 17:50

Package PRocess has a function to find peaks in spectra. There are many more, if you search for something like "peak finding R"

查看更多
甜甜的少女心
3楼-- · 2019-06-14 17:55

Here's a worked example with a reproducible piece of code. I don't claim it's any good with regard to the strategy or coding, but it could get you started.

find_peaks <- function (x, y, n.fine = length(x), interval = range(x), ...) {
  maxdif <- max(diff(x)) # longest distance between successive points

  ## selected interval for the search
  range.ind <- seq(which.min(abs(x - interval[1])),
                   which.min(abs(x - interval[2])))
  x <- x[range.ind]
  y <- y[range.ind]

  ## smooth the data
  spl <- smooth.spline(x, y, ...)
  ## finer x positions
  x.fine <- seq(range(x)[1], range(x)[2], length = n.fine)
  ## predicted y positions
  y.spl <- predict(spl, x.fine, der = 0)$y
  ## testing numerically the second derivative
  test <- diff(diff((y.spl), 1) > 0, 1)
  maxima <- which(test == -1) + 1

  ## according to this criterion, we found rough positions
  guess <- data.frame(x=x.fine[maxima], y=y.spl[maxima])

  ## cost function to maximize 
  obj <- function(x) predict(spl, x)$y

  ## optimize the peak position around each guess
  fit <- data.frame(do.call(rbind,
          lapply(guess$x, function(g) {
            fit <- optimize(obj, interval = g + c(-1,1) * maxdif, maximum=TRUE)
            data.frame(x=fit$maximum,y=fit$objective)
          })))

  ## return both guesses and fits
  invisible(list(guess=guess, fit=fit))
}

set.seed(123)
x <- seq(1, 15, length=100)
y <- jitter(cos(x), a=0.2)

plot(x,y)
res <- find_peaks(x,y)
points(res$guess,col="blue")
points(res$fit,col="red")

test

查看更多
登录 后发表回答