my power spectra are believable? a comparison betw

2019-03-31 03:06发布

问题:

Could anyone kindly point out why I get very different results?
There are many peaks which should not appear.In fact,there should be only one peak.
I am a python newbie and all comments about my code below are welcome.

The test data is here.enter link description here
You can directly wget https://clbin.com/YJkwr .
Its first column are the arrival times for a series of photons.A light source emits photons randomly. The total time is 55736s and there are 67398 photons. I try to detect some kind of periodicity of light intensity.
We can bin the time and light intensity is proportional to the photon numbers in every time bin.

I tried numpy.fft and lomb-scargle of scipy.signal to make its power spectrum,but got very different results.

fft

 import pylab as pl
 import numpy as np

 timepoints=np.loadtxt('timesequence',usecols=(0,),unpack=True,delimiter=",")
 binshu=50000
 interd=54425./binshu  
 t=np.histogram(timepoints,bins=binshu)[0]
 sp = np.fft.fft(t)
 freq = np.fft.fftfreq(len(t),d=interd)  
 freqnum = np.fft.fftfreq(len(t),d=interd).argsort()  
 pl.xlabel("frequency(Hz)")
 pl.plot(freq[freqnum],np.abs(sp)[freqnum])

lomb-scargle

 timepoints=np.loadtxt('timesequence',usecols=(0,),unpack=True,delimiter=",")
 binshu=50000
 intensity=np.histogram(timepoints,bins=binshu)[0].astype('float64') 
 middletime=np.histogram(timepoints,bins=binshu)[1][1:binshu+1]-np.histogram(timepoints,bins=binshu)[1][3]*0.5
 freq1=1./(timepoints.max()-timepoints.min())
 freq2=freq1*len(timepoints)
 freqs=np.linspace(freq1,freq2,1000)
 pm1=spectral.lombscargle(middletime,intensity,freqs)

 pl.xlabel("frequency(Hz)")
 pl.plot(freqs,pm1)
 pl.xlim(0.5*freq1,2*freq2)
 pl.ylim(0,250)
 pl.show()

*********************************
Thank you, Warren,I appreciate it.Thank you for your detailed reply.

You are right,there is an integration time,here it is around 1.7s.
There are many single exposures.Every exposure costs 1.7s
In one single exposure,we can not tell its arrival time precisely.

If the time series are like:
0 1.7 3.4 8.5 8.5

The integration time of the last two photons are 1.7s,not (8.5-3.4)s.So I will revise part of your code.

HOWEVER my question still remains. You adjust several parameters to get the 0.024Hz peak in lomb-scargle peak to some degree. And you use this to guide your parameters in fft.

If you do not know the the number 0.024, maybe you may use different parameters to get a different highest peak ?

How to guarantee every time we can get right the num_ls_freqs ? You can see if we choose different num_ls_freqs,the highest peak shifts.

If I have many timing series, every time I have to specify different parameters? And how to get them?

回答1:

It appears that 50000 bins is not enough. If you change binshu, you'll see that the locations of the spikes change, and not just a little.

Before diving into FFT or Lomb-Scargle, I found it useful to investigate the raw data first. Here are the beginning and end of the file:

0,1,3.77903
0,1,4.96859
0,1,1.69098
1.74101,1,4.87652
1.74101,1,5.15564
1.74101,1,2.73634
3.48202,1,3.18583
3.48202,1,4.0806
5.2229,1,1.86738
6.96394,1,7.27398
6.96394,1,3.59345
8.70496,1,4.13443
8.70496,1,2.97584
...
55731.7,1,5.74469
55731.7,1,8.24042
55733.5,1,4.43419
55733.5,1,5.02874
55735.2,1,3.94129
55735.2,1,3.54618
55736.9,1,3.99042
55736.9,1,5.6754
55736.9,1,7.22691

There are clusters of identical arrival times. (Is this the raw output of the photon detector, or has this file been preprocessed in some way?)

So the first step is to coalesce these clusters into a single number at a single time, so the data looks like (timepoint, count), e.g.:

0, 3
1.74101, 3
3.48202, 2
5.2229, 1
etc

The following will accomplish that counting:

times, inv = np.unique(timepoints, return_inverse=True)
counts = np.bincount(inv)

Now times is an array of increasing unique time values, and counts is the number of photons detected at that time. (Note that I'm guessing that this is a correct interpretation of the data!)

Let's see if the sampling is anywhere near uniform. The array of interarrival times is

dt = np.diff(times)

If the sampling was perfectly uniform, all the values in dt would be the same. We can use the same pattern used above on timepoints to find the unique values (and frequency of their occurrence) in dt:

dts, inv = np.unique(dt, return_inverse=True)
dt_counts = np.bincount(inv)

For example, here's what dts looks like if we print it:

[  1.7       1.7       1.7       1.7       1.74      1.74      1.74      1.74
   1.74      1.74      1.74088   1.741     1.741     1.741     1.741
   1.741     1.741     1.741     1.74101   1.74102   1.74104   1.7411
   1.7411    1.7411    1.7411    1.7411    1.742     1.742     1.742
   1.746     1.75      1.8       1.8       1.8       1.8       3.4       3.4
   3.4       3.4       3.48      3.48      3.48      3.48      3.48      3.482
   3.482     3.482     3.482     3.482     3.4821    3.483     3.483     3.49
   3.49      3.49      3.49      3.49      3.5       3.5       5.2       5.2
   5.2       5.2       5.22      5.22      5.22      5.22      5.22      5.223
   5.223     5.223     5.223     5.223     5.223     5.23      5.23      5.23
   5.3       5.3       5.3       5.3       6.9       6.9       6.9       6.9
   6.96      6.96      6.964     6.964     6.9641    7.        8.7       8.7
   8.7       8.7       8.7       8.71     10.4      10.5      12.2    ]

(The apparent repeated values are actually different. The full precision of the numbers is not shown.) If the sampling was perfectly uniform, there would be just one number in that list. Instead we see clusters of numbers, with no obvious dominant values. (There is a predominance of multiples of 1.74--is that a characteristic of the detector?)

Based on that observation, we'll start with Lomb-Scargle. The script below includes code to compute and plot the Lomb-Scargle periodogram of the (times, counts) data. The frequencies given to lombscargle vary from 1/trange, where trange is the full time span of the data, to 1/dt.min(). The number of frequencies is 16000 (num_ls_freqs in the script). Some trial-and-error showed that this is roughly the minimum number needed to resolve the spectrum. Less than this, and the peaks start moving around. More than this, and there is little change in the spectrum. The calculation shows that there is a peak at 0.0242 Hz. The other peaks are harmonics of this frequency.

Now that we have an estimate of the fundamental frequency, we can use it to guide the choice of the bin size in a histogram of timepoints to be used in an FFT calculation. We'll use a bin size that results in oversampling the fundamental frequency by a factor of 8. (In the script below, this is m.) That is, we do

m = 8
nbins = int(m * ls_peak_freq * trange + 0.5)
hist, bin_edges = np.histogram(timepoints, bins=nbins, density=True)

(timepoints is the original set of times read from the file; it includes many repeated time values, as noted above.)

Then we apply the FFT to hist. (We'll actually use numpy.fft.rfft.) The following is the plot of the spectrum computed using the FFT:

As expected, there is a peak at 0.0242 Hz.

Here's the script:

import numpy as np
from scipy.signal import lombscargle
import matplotlib.pyplot as plt


timepoints = np.loadtxt('timesequence', usecols=(0,), delimiter=",")

# Coalesce the repeated times into the `times` and `counts` arrays.
times, inv = np.unique(timepoints, return_inverse=True)
counts = np.bincount(inv)

# Check the sample spacing--is the sampling uniform?
dt = np.diff(times)
dts, inv = np.unique(dt, return_inverse=True)
dt_counts = np.bincount(inv)
print dts
# Inspection of `dts` shows that the smallest dt is about 1.7, and there
# are many near multiples of 1.74,  but the sampling is not uniform,
# so we'll analyze the spectrum using lombscargle.


# First remove the mean from the data.  This is not essential; it just
# removes the large value at the 0 frequency that we don't care about.
counts0 = counts - counts.mean()

# Minimum interarrival time.
dt_min = dt.min()

# Total time span.
trange = times[-1] - times[0]

# --- Lomb-Scargle calculation ---
num_ls_freqs = 16000
ls_min_freq = 1.0 / trange
ls_max_freq = 1.0 / dt_min
freqs = np.linspace(ls_min_freq, ls_max_freq, num_ls_freqs)
ls_pgram = lombscargle(times, counts0, 2*np.pi*freqs)

ls_peak_k = ls_pgram.argmax()
ls_peak_freq = freqs[ls_peak_k]
print "ls_peak_freq  =", ls_peak_freq


# --- FFT calculation of the binned data ---
# Assume the Lomb-Scargle calculation gave a good estimate
# of the fundamental frequency.  Use a bin size for the histogram
# of timepoints that oversamples that period by m.
m = 8
nbins = int(m * ls_peak_freq * trange + 0.5)
hist, bin_edges = np.histogram(timepoints, bins=nbins, density=True)
delta = bin_edges[1] - bin_edges[0]

fft_coeffs = np.fft.rfft(hist - hist.mean())
fft_freqs = np.fft.fftfreq(hist.size, d=delta)[:fft_coeffs.size]
# Hack to handle the case when hist.size is even.  `fftfreq` puts
# -nyquist where we want +nyquist.
fft_freqs[-1] = abs(fft_freqs[-1])

fft_peak_k = np.abs(fft_coeffs).argmax()
fft_peak_freq = fft_freqs[fft_peak_k]
print "fft_peak_freq =", fft_peak_freq


# --- Lomb-Scargle plot ---
plt.figure(1)
plt.clf()
plt.plot(freqs, ls_pgram)
plt.title('Spectrum computed by Lomb-Scargle')
plt.annotate("%6.4f Hz" % ls_peak_freq, 
             xy=(ls_peak_freq, ls_pgram[ls_peak_k]),
             xytext=(10, -10), textcoords='offset points')
plt.xlabel('Frequency (Hz)')
plt.grid(True)


# --- FFT plot ---
plt.figure(2)
plt.clf()
plt.plot(fft_freqs, np.abs(fft_coeffs)**2)
plt.annotate("%6.4f Hz" % fft_peak_freq,
             xy=(fft_peak_freq, np.abs(fft_coeffs[fft_peak_k])**2),
             xytext=(10, -10), textcoords='offset points')
plt.title("Spectrum computed by FFT")
plt.grid(True)

plt.show()


回答2:

The power spectrum is the absolute value of the fourier transform squared. If on top of that you get rid of the DC value in your FFT, plot only the positive half of your results, and compute the bin width from the output of np.histogram instead of with some hardcoded magic number, this is what you get:

timepoints=np.loadtxt('timesequence',usecols=(0,),unpack=True,delimiter=",")
binshu=50000
t, _=np.histogram(timepoints,bins=binshu)
interd = _[1] - _[0]
sp = np.fft.fft(t)
freq = np.fft.fftfreq(len(t),d=interd)
n = len(freq)
pl.xlabel("frequency(Hz)")
pl.plot(freq[1:n//2],(sp*sp.conj())[1:n//2])

This looks much, much similar to your lombscargle periodogram. There is the mismatch in the position of the big peak, which is around 0.32 in the FFT based spectrum, and more like 0.4x in your lombscargle output. Not sure what's going on in there, but I don't fully understand your frequency calculations, especially in the lombscargle case...