numpy.histogram(data, bins) is a very fast and efficient way to calculate how many elements of the data array fall in a bin defined by the array bins. Is there an equivalent function to solve the following problem?. I have a matrix with R rows times C columns. I want to bin each row of the matrix using the definition given by bins. The result should be a further matrix with R rows, and with the number of column equal to the number of bins.
I tried to use the function numpy.histogram(data, bins) giving as input a matrix, but I found that the matrix is treated as an array with R*C elements. Then, the result is an array with Nbins elements.
If you're applying this to an array that has many rows this function will give you some speed up at the cost of some temporary memory.
def hist_per_row(data, bins):
data = np.asarray(data)
assert np.all(bins[:-1] <= bins[1:])
r, c = data.shape
idx = bins.searchsorted(data)
step = len(bins) + 1
last = step * r
idx += np.arange(0, last, step).reshape((r, 1))
res = np.bincount(idx.ravel(), minlength=last)
res = res.reshape((r, step))
return res[:, 1:-1]
The res[:, 1:-1]
on the last line is to be consistent with numpy.histogram which returns an array with len len(bins) - 1
, but you could drop it if you want to count values that are less than and greater than bins[0]
and bins[-1]
respectively.
Thank you everybody for your answers and comments. Finally, I found a way to speed up the binning procedure. Instead of using np.searchsorted(data)
, I am doing np.array(data*nbins, dtype=int)
. Substituting this line in the code posted by Bi Rico, I found that it becomes a factor 3 faster. Here below I post the function by Bi Rico with my modification, so that other user can easily take it.
def hist_per_row(data, bins):
data = np.asarray(data)
assert np.all(bins[:-1] <= bins[1:])
r, c = data.shape
nbins = len(bins)-1
data = data/bins[-1]
idx = array(data*nbins, dtype=int)+1
step = len(bins) + 1
last = step * r
idx += np.arange(0, last, step).reshape((r, 1))
res = np.bincount(idx.ravel(), minlength=last)
res = res.reshape((r, step))
return res[:, 1:-1]
something along these lines?
import numpy as np
data = np.random.rand(10,20)
print np.apply_along_axis(lambda x: np.histogram(x)[0], 1, data)