numpy.histogram(data, bins) is a very fast and efficient way to calculate how many elements of the data array fall in a bin defined by the array bins. Is there an equivalent function to solve the following problem?. I have a matrix with R rows times C columns. I want to bin each row of the matrix using the definition given by bins. The result should be a further matrix with R rows, and with the number of column equal to the number of bins.
I tried to use the function numpy.histogram(data, bins) giving as input a matrix, but I found that the matrix is treated as an array with R*C elements. Then, the result is an array with Nbins elements.
Thank you everybody for your answers and comments. Finally, I found a way to speed up the binning procedure. Instead of using
np.searchsorted(data)
, I am doingnp.array(data*nbins, dtype=int)
. Substituting this line in the code posted by Bi Rico, I found that it becomes a factor 3 faster. Here below I post the function by Bi Rico with my modification, so that other user can easily take it.If you're applying this to an array that has many rows this function will give you some speed up at the cost of some temporary memory.
The
res[:, 1:-1]
on the last line is to be consistent with numpy.histogram which returns an array with lenlen(bins) - 1
, but you could drop it if you want to count values that are less than and greater thanbins[0]
andbins[-1]
respectively.something along these lines?