I am having some trouble developing a suitably fast binning algorithm in Mathematica. I have a large (~100k elements) data set of the form T={{x1,y1,z1},{x2,y2,z2},....} and I want to bin it into a 2D array of around 100x100 bins, with the bin value being given by the sum of the Z values that fall into each bin.
Currently I am iterating through each element of the table, using Select to pick out which bin it is supposed to be in based on lists of bin boundaries, and adding the z value to a list of values occupying that bin. At the end I map Total onto the list of bins, summing their contents (I do this because I sometimes want to do other things, like maximize).
I have tried using Gather and other such functions to do this but the above method was ridiculously faster, though perhaps I am using Gather poorly. Anyway It still takes a few minutes to do the sorting by my method and I feel like Mathematica can do better. Does anyone have a nice efficient algorithm handy?
Here's my approach:
These two approaches (
res1
&res2
) can handle 100k and 200k elements per second, respectively, on this machine. Is this sufficiently fast, or do you need to run this whole program in a loop?Here is a method based on Szabolcs's post that is about about an order of magnitude faster.
Gives about {2.012217, Null}
Gives about {0.195228, Null}
"TreatRepeatedEntries" -> 1 adds duplicate positions up.
Here's my approach using the function SelectEquivalents defined in What is in your Mathematica tool bag? which is perfect for a problem like this one.
If you would want to group according to more than two dimensions you could use in FinalFunction this function to give to the list result the desired dimension (I don't remember where I found it).
I intend to do a rewrite of the code below because of Szabolcs' readability concerns. Until then, know that if your bins are regular, and you can use
Round
,Floor
, orCeiling
(with a second argument) in place ofNearest
, the code below will be much faster. On my system, it tests faster than theGatherBy
solution also posted.Assuming I understand your requirements, I propose:
Refactored:
Use: