In my models, one of the most repeated tasks to be done is counting the number of each element within an array. The counting is from a closed set, so I know there are X
types of elements, and all or some of them populate the array, along with zeros that represent 'empty' cells. The array is not sorted in any way, and could by quite long (about 1M elements), and this task is done thousands of times during one simulation (which is also part of hundreds of simulations). The result should be a vector r
of size X
, so r(k)
is the amount of k
in the array.
Example:
For X = 9
, if I have the following input vector:
v = [0 7 8 3 0 4 4 5 3 4 4 8 3 0 6 8 5 5 0 3]
I would like to get this result:
r = [0 0 4 4 3 1 1 3 0]
Note that I don't want the count of zeros, and that elements that don't appear in the array (like 2
) have a 0
in the corresponding position of the result vector (r(2) == 0
).
What would be the fastest way to achieve this goal?
tl;dr: The fastest method depend on the size of the array. For array smaller than 214 method 3 below (
accumarray
) is faster. For arrays larger than that method 2 below (histcounts
) is better.UPDATE: I tested this also with implicit broadcasting, that was introduced in 2016b, and the results are almost equal to the
bsxfun
approach, with no significant difference in this method (relative to the other methods).Let's see what are the available methods to perform this task. For the following examples we will assume
X
hasn
elements, from 1 ton
, and our array of interest isM
, which is a column array that can vary in size. Our result vector will bespp
1, such thatspp(k)
is the number ofk
s inM
. Although I write here aboutX
, there is no explicit implementation of it in the code below, I just definen = 500
andX
is implicitly1:500
.The naive
The most simple and straightforward way to cope this task is by afor
loopfor
loop that iterate over the elements inX
and count the number of elements inM
that equal to it:This is off course not so smart, especially if only little group of elements from
X
is populatingM
, so we better look first for those that are already inM
:Usually, in MATLAB, it is advisable to take advantage of the built-in functions as much as possible, since most of the times they are much faster. I thought of 5 options to do so:
1. The function
The functiontabulate
tabulate
returns a very convenient frequency table that at first sight seem to be the perfect solution for this task:The only fix to be done is to remove the first row of the table if it counts the
0
element (it could be that there are no zeros inM
).2. The function
Another option that can be tweaked quite easily to our need ithistcounts
histcounts
:here, in order to count all different elements between 1 to
n
separately, we define the edges to be1:n+1
, so every element inX
has it's own bin. We could write alsohistcounts(M(M>0),'BinMethod','integers')
, but I already tested it, and it takes more time (though it makes the function independent ofn
).3. The function
The next option I'll bring here is the use of the functionaccumarray
accumarray
:here we give the function
M(M>0)
as input, to skip the zeros, and use1
as thevals
input to count all unique elements.4. The function
We can even use binary operationbsxfun
@eq
(i.e.==
) to look for all elements from each type:if we keep the first input
M
and the second1:n
in different dimensions, so one is a column vector the other is a row vector, then the function compares each element inM
with each element in1:n
, and create alength(M)
-by-n
logical matrix than we can sum to get the desired result.5. The function
Another option, similar to thendgrid
bsxfun
, is to explicitly create the two matrices of all possibilities using thendgrid
function:then we compare them and sum over columns, to get the final result.
Benchmarking
I have done a little test to find the fastest method from all mentioned above, I defined
n = 500
for all trails. For some (especially the naivefor
) there is a great impact ofn
on the time of execution, but this is not the issue here since we want to test it for a givenn
.Here are the results:
We can notice several things:
accumarray
is the fastest. For arrays larger than 214histcounts
is the fastest.for
loops, in both versions are the slowest, but for arrays smaller than 28 the "unique & for" option is slower.ndgrid
become the slowest in arrays bigger than 211, probably because of the need to store very large matrices in memory.tabulate
works on arrays in size smaller than 29. This result was consistent (with some variation in the pattern) in all the trials I conducted.(the
bsxfun
andndgrid
curves are truncated because it makes my computer stuck in higher values, and the trend is quite clear already)Also, notice that the y-axis is in log10, so a decrease in unit (like for arrays in size 219, between
accumarray
andhistcounts
) means a 10-times faster operation.I'll be glad to hear in the comments for improvements to this test, and if you have another, conceptually different method, you are most welcome to suggest it as an answer.
The code
Here are all the functions wrapped in a timing function:
And here is the script to run this code and produce the graph:
1 The reason for this weird name comes from my field, Ecology. My models are a cellular-automata, that typically simulate individual organisms in a virtual space (the
M
above). The individuals are of different species (hencespp
) and all together form what is called "ecological community". The "state" of the community is given by the number of individuals from each species, which is thespp
vector in this answer. In this models, we first define a species pool (X
above) for the individuals to be drawn from, and the community state take into account all species in the species pool, not only those present inM
We know that that the input vector always contains integers, so why not use this to "squeeze" a bit more performance out of the algorithm?
I've been experimenting with some optimizations of the the two best binning methods suggested by the OP, and this is what I came up with:
X
in the question, orn
in the example) should be explicitly converted to an (unsigned) integer type.accumi_new
function below).This function takes about 30sec to run on my machine. I'm using MATLAB R2016a.