Could anyone kindly point out why I get very different results?
There are many peaks which should not appear.In fact,there should be only one peak.
I am a python newbie and all comments about my code below are welcome.
The test data is here.enter link description here
You can directly wget https://clbin.com/YJkwr
.
Its first column are the arrival times for a series of photons.A light source emits photons randomly.
The total time is 55736s and there are 67398 photons.
I try to detect some kind of periodicity of light intensity.
We can bin the time and light intensity is proportional to the photon numbers in every time bin.
I tried numpy.fft and lomb-scargle of scipy.signal to make its power spectrum,but got very different results.
fft
import pylab as pl
import numpy as np
timepoints=np.loadtxt('timesequence',usecols=(0,),unpack=True,delimiter=",")
binshu=50000
interd=54425./binshu
t=np.histogram(timepoints,bins=binshu)[0]
sp = np.fft.fft(t)
freq = np.fft.fftfreq(len(t),d=interd)
freqnum = np.fft.fftfreq(len(t),d=interd).argsort()
pl.xlabel("frequency(Hz)")
pl.plot(freq[freqnum],np.abs(sp)[freqnum])
lomb-scargle
timepoints=np.loadtxt('timesequence',usecols=(0,),unpack=True,delimiter=",")
binshu=50000
intensity=np.histogram(timepoints,bins=binshu)[0].astype('float64')
middletime=np.histogram(timepoints,bins=binshu)[1][1:binshu+1]-np.histogram(timepoints,bins=binshu)[1][3]*0.5
freq1=1./(timepoints.max()-timepoints.min())
freq2=freq1*len(timepoints)
freqs=np.linspace(freq1,freq2,1000)
pm1=spectral.lombscargle(middletime,intensity,freqs)
pl.xlabel("frequency(Hz)")
pl.plot(freqs,pm1)
pl.xlim(0.5*freq1,2*freq2)
pl.ylim(0,250)
pl.show()
*********************************
Thank you, Warren,I appreciate it.Thank you for your detailed reply.
You are right,there is an integration time,here it is around 1.7s.
There are many single exposures.Every exposure costs 1.7s
In one single exposure,we can not tell its arrival time precisely.
If the time series are like:
0 1.7 3.4 8.5 8.5
The integration time of the last two photons are 1.7s
,not (8.5-3.4)s
.So I will revise part of your code.
HOWEVER my question still remains. You adjust several parameters to get the 0.024Hz
peak in lomb-scargle peak to some degree. And you use this to guide your parameters in fft.
If you do not know the the number 0.024
, maybe you may use different parameters to get a different highest peak ?
How to guarantee every time we can get right the num_ls_freqs
? You can see if we choose different num_ls_freqs
,the highest peak shifts.
If I have many timing series, every time I have to specify different parameters? And how to get them?
It appears that 50000 bins is not enough. If you change
binshu
, you'll see that the locations of the spikes change, and not just a little.Before diving into FFT or Lomb-Scargle, I found it useful to investigate the raw data first. Here are the beginning and end of the file:
There are clusters of identical arrival times. (Is this the raw output of the photon detector, or has this file been preprocessed in some way?)
So the first step is to coalesce these clusters into a single number at a single time, so the data looks like (timepoint, count), e.g.:
The following will accomplish that counting:
Now
times
is an array of increasing unique time values, andcounts
is the number of photons detected at that time. (Note that I'm guessing that this is a correct interpretation of the data!)Let's see if the sampling is anywhere near uniform. The array of interarrival times is
If the sampling was perfectly uniform, all the values in
dt
would be the same. We can use the same pattern used above ontimepoints
to find the unique values (and frequency of their occurrence) indt
:For example, here's what
dts
looks like if we print it:(The apparent repeated values are actually different. The full precision of the numbers is not shown.) If the sampling was perfectly uniform, there would be just one number in that list. Instead we see clusters of numbers, with no obvious dominant values. (There is a predominance of multiples of 1.74--is that a characteristic of the detector?)
Based on that observation, we'll start with Lomb-Scargle. The script below includes code to compute and plot the Lomb-Scargle periodogram of the (
times
,counts
) data. The frequencies given tolombscargle
vary from1/trange
, wheretrange
is the full time span of the data, to1/dt.min()
. The number of frequencies is 16000 (num_ls_freqs
in the script). Some trial-and-error showed that this is roughly the minimum number needed to resolve the spectrum. Less than this, and the peaks start moving around. More than this, and there is little change in the spectrum. The calculation shows that there is a peak at 0.0242 Hz. The other peaks are harmonics of this frequency.Now that we have an estimate of the fundamental frequency, we can use it to guide the choice of the bin size in a histogram of
timepoints
to be used in an FFT calculation. We'll use a bin size that results in oversampling the fundamental frequency by a factor of 8. (In the script below, this ism
.) That is, we do(
timepoints
is the original set of times read from the file; it includes many repeated time values, as noted above.)Then we apply the FFT to
hist
. (We'll actually usenumpy.fft.rfft
.) The following is the plot of the spectrum computed using the FFT:As expected, there is a peak at 0.0242 Hz.
Here's the script:
The power spectrum is the absolute value of the fourier transform squared. If on top of that you get rid of the DC value in your FFT, plot only the positive half of your results, and compute the bin width from the output of
np.histogram
instead of with some hardcoded magic number, this is what you get:This looks much, much similar to your lombscargle periodogram. There is the mismatch in the position of the big peak, which is around 0.32 in the FFT based spectrum, and more like 0.4x in your lombscargle output. Not sure what's going on in there, but I don't fully understand your frequency calculations, especially in the lombscargle case...