Power Spectrum and Autocorrelation of Data in Nump

I am interested in computing the power spectrum of a system of particles (~100,000) in 3D space with Python. What I have found so far is a group of functions in Numpy (fft,fftn,..) which compute the discrete Fourier transform, of which the square of the absolute value is the power spectrum. My question is a matter of how my data are being represented - and truthfully may be fairly simple to answer.

The data structure I have is an array which has a shape of (n,2), n being the number of particles I have, and each column representing either the x, y, and z coordinate of the n particles. The function I believe I should be using it the fftn() function, which takes the discrete Fourier transform of an n-dimensional array - but it says nothing about the format. How should the data be represented as a data structure to be fed into fftn?

Here is what I've tried so far to test the function:

import numpy as np
import random
import matplotlib.pyplot as plt

DATA = np.zeros((100,3))

for i in range(len(DATA)):
    DATA[i,0] = random.uniform(-1,1)
    DATA[i,1] = random.uniform(-1,1)
    DATA[i,2] = random.uniform(-1,1)

FFT = np.fft.fftn(DATA)
PS = abs(FFT)**2

plt.plot(PS)
plt.show()

The array entitled DATA is a mock array, ultimately the thing which will be 100,000 by 3 in shape. The output of the code gives me something like:

As you can see, I think this is giving me three 1D power spectra (1 for each column of my data), but really I'd like a power spectrum as a function of radius.

Does anybody have any advice or alternative methods/packages they know of to compute the power spectrum (I'd even settle for the two point autocorrelation function).

It doesn't quite work the way you are setting it out...

You need a function, lets call it f(x, y, z), that describes the density of mass in space. In your case, you can consider the galaxies as point masses, so you will have a delta function centered at the location of each galaxy. It is for this function that you can calculate the three-dimensional autocorrelation, from which you could calculate the power spectrum.

If you want to use numpy to do that for you, you are first going to have to discretize your function. A possible mock example would be:

import numpy as np
import matplotlib.pyplot as plt

space = np.zeros((100, 100, 100), dtype=np.uint8)

x, y, z = np.random.randint(100, size=(3, 1000))
space[x, y, z] += 1

space_ps = np.abs(np.fft.fftn(space))
space_ps *= space_ps

space_ac = np.fft.ifftn(space_ps).real.round()
space_ac /= space_ac[0, 0, 0]

And now space_ac holds the three-dimensional autocorrelation function for the data set. This is not quite what you are after, and to get you one-dimensional correlation function you would have to average the values on spherical shells around the origin:

dist = np.minimum(np.arange(100), np.arange(100, 0, -1))
dist *= dist
dist_3d = np.sqrt(dist[:, None, None] + dist[:, None] + dist)
distances, _ = np.unique(dist_3d, return_inverse=True)
values = np.bincount(_, weights=space_ac.ravel()) / np.bincount(_)

plt.plot(distances[1:], values[1:])

There is another issue with doing things yourself this way: when you compute the power spectrum as above, mathematically is as if your three dimensional array wrapped around the borders, i.e. point [999, y, z] is a neighbour to [0, y, z]. So your autocorrelation could show two very distant galaxies as close neighbours. The simplest way to deal with this is by making your array twice as large along every dimension, padding with extra zeros, and then discarding the extra data.

Alternatively you could use scipy.ndimage.filters.correlate with mode='constant' to do all the dirty work for you.