Suppose I have
- a data array,
- an array containing keys referencing entries in the data array and
- a third array which contains an id for every key array entry
e.g.
DataType dataArray[5];
int keyArray[10] = {1, 2, 3, 1, 2, 2, 1, 1, 1, 1};
int ids[10] = {0, 0, 0, 1, 2, 2, 2, 3, 3, 3};
How can I execute a custom operator ResultDataType fun(int key1, int key2, int id)
pairwise for each segment of ids ignoring the case key1 == key2
using thrust?
In this example I'd like to execute and store the result of:
fun(1,2,0)
fun(1,3,0)
fun(2,3,0)
fun(2,1,2)
I've found this solution to your question. I used a mix of cuda kernels and thrust primitives. I suppose that your operator has the commutative property on the keys arguments, that is
My solution produce three output arrays (keys1, keys2 and ids) in two steps. In the first step it computes only the number of elements in the output arrays and in the second step it fill the arrays. Basically the algorithm runs two times: in the first time it "simulates" the writing and the second time it actually writes the output. This is a common pattern that I use when the output size depends on the input data.
Here are the steps:
Here is the output of my algorithm:
Note that the last column in the output is not exactly like you want but if the commutative property hold it's fine.
Here is my complete code. Probably this is not the fastest solution but I think it's simple.
This turned out to be more involved than I initially thought. My interpretation is that you essentially want to compute all combinations of N things taken two at a time, without repetition, per segment (N is different for each segment). I'll present an answer that I think covers most of the concepts you might want to consider. It's just one possible solution method, of course.
This may not arrive at precisely the solution you want (although I think it's pretty close). My purpose is not to give you a finished code, but to demonstrate a possible algorithmic approach.
Here's a walkthrough of the algorithm:
Sort the keys by segment. This step isn't necessary if you can guarantee that like keys (within a segment) are grouped together. (Your data as presented wouldn't require this step.) We can use back-to-back
stable_sort_by_key
to accomplish this. Output:Reduce the data set to just the unique keys, per segment (remove duplicated keys.) We can do this on the keys and segments together with
thrust::unique_copy
, and an appropriate functor to define equality of keys within a segment only. Output:Compute the length of each segment now that duplicate keys are removed. We can do this with
thrust::reduce_by_key
and aconstant_iterator
. Output:Now that we know each segment length, we want to remove any segment (plus it's associate keys) for which the segment length is 1. These segments don't have any key pairs possible. To facilitate this, we also need to compute the starting index of each segment and create a stencil that identifies the segments whose length is one, then we can use
remove_if
andscatter_if
to remove the associated segment and key data. Output:and the reduced data:
Our goal at this point will be to create a vector of appropriate length so that it can hold one entry for each combination of N things taken two at a time, where N is the length of each segment. So for the first segment above, it has 3 items, so the combination of 3 things taken 2 at a time requires storage for 3 combinations total. Likewise the second segment above of length 2 has only one unique combination between the 2 keys, therefore storage is needed for only one combination for that segment. We can compute this storage length using the standard combinations formula reduced to combinations of N things taken 2 at a time:
we'll use this formula, passed to
thrust::transform_reduce
along with our segment lengths, to compute the overall storage length needed. (4 combinations total, for this example).Once we have the overall length determined, and the necessary vectors of that length allocated, we need to generate the actual combinations that belong to each segment. This again is a multi-step sequence that begins with generating flags to mark (delineate) each "segment" within these vectors. With the flag array, we can use
exclusive_scan_by_key
to generate an ordinal sequence identifying the combination that belongs in each position, per segment. Output:With the ordinal sequence per segment, we now need to generate the actual unique combinations, per segment. This step required some consideration to try and come up with an algorithm to accomplish this in a fixed time (ie. without iteration). The algorithm I came up with is to map each ordinal sequence to a matrix, so for a segment with 4 combinations (which would have 4 things taken 2 at a time, so 6 total combinations, larger than any in this example):
We then "re-map" any combinations (
Cx
above) that are below the main diagonal to alternate locations in the matrix, like this:we can then read off unique combination pairs as the row and column indices of the above specially-crafted matrix. (effectively a linear-to-triangular mapping) So C1 corresponds to the combination of logical key 0 and logical key 1. C6 corresponds to the combination of logical key 1 and logical key 3. This matrix assembly and mapping to row and column "indices" to produce unique combinations is handled by the
comb_n
functor passed tothrust::for_each_n
, which is also receiving the segment lengths and the input segment ordinal sequences, and is generating "logical" keys 1 and 2 as output:(added note: I believe the approach I am describing here is comparable to what is discussed in this question/answer, which I stumbled upon recently, although they appear to be different at first glance.)
The last transformation step is to convert our logical keys and segments into "actual" keys and segments. Output:
Here's a complete code that implements the above. It may not do precisely what you want, but I think it's pretty close to what you described, and hopefully will be useful for ideas if you want to do this with thrust: