I have the following input data:
e = 0 0 0 0 0 0 | 1 1 1
t = 1 1 4 4 4 5 | 1 6 7
i = 0 1 2 3 4 5 | 6 7 8 // indices from [0,n-1]
The data is first sorted by e
, then by t
. e
is the key which identifies segments in the data. In this case:
segment_0 = [0,5]
segment_1 = [6,8]
Each segment is again segmented by t
. In this case:
sub_segment_0_0 = [0,1] // t==1
sub_segment_0_1 = [2,4] // t==4
sub_segment_0_2 = [5,5] // t==5
sub_segment_1_0 = [6,6] // t==1
sub_segment_1_1 = [7,7] // t==6
sub_segment_1_2 = [8,8] // t==7
I want to create the following output sequences:
f = 2 2 5 5 5 6 | 7 8 9
l = 6 6 6 6 6 6 | 9 9 9
f
contains the start index of the next sub_segment within the current segment.
l
contains (the end index of the last sub_segment within the current segment) + 1.
For the last sub_segment of each segment both values should point to its end index.
In order to calculate f
, I tried using thrust::upper_bound
, but this only works if I have just one sub_segment:
#include <thrust/host_vector.h>
#include <thrust/copy.h>
#include <thrust/binary_search.h>
#include <thrust/device_vector.h>
#include <stdint.h>
#include <iostream>
#define PRINTER(name) print(#name, (name))
template <template <typename...> class V, typename T, typename ...Args>
void print(const char* name, const V<T,Args...> & v)
{
std::cout << name << ":\t";
thrust::copy(v.begin(), v.end(), std::ostream_iterator<T>(std::cout, "\t"));
std::cout << std::endl;
}
int main()
{
uint32_t e[] = {0,0,0,0,0,0};
uint32_t t[] = {1,1,4,4,4,5};
uint32_t i[] = {0,1,2,3,4,5};
int size = sizeof(i)/sizeof(i[0]);
typedef thrust::host_vector<uint32_t> HVec;
typedef thrust::device_vector<uint32_t> DVec;
HVec h_i(i,i+size);
HVec h_e(e,e+size);
HVec h_t(t,t+size);
DVec d_i = h_i;
DVec d_e = h_e;
DVec d_t = h_t;
PRINTER(d_e);
PRINTER(d_t);
PRINTER(d_i);
DVec upper(size);
thrust::upper_bound(d_t.begin(), d_t.end(), d_t.begin(), d_t.end(), upper.begin());
PRINTER(upper);
return 0;
}
output:
d_e: 0 0 0 0 0 0
d_t: 1 1 4 4 4 5
d_i: 0 1 2 3 4 5
upper: 2 2 5 5 5 6
If I use the input data containing two sub_segments, it won't work anymore, since there is no thrust::upper_bound_by_key
:
// replace in the code above
uint32_t e[] = {0,0,0,0,0,0,1,1,1};
uint32_t t[] = {1,1,4,4,4,5,1,6,7};
uint32_t i[] = {0,1,2,3,4,5,6,7,8};
output
d_e: 0 0 0 0 0 0 1 1 1
d_t: 1 1 4 4 4 5 1 6 7
d_i: 0 1 2 3 4 5 6 7 8
upper: 2 2 7 7 7 7 2 8 9
How would a upper_bound_by_key
be implemented for my data?
And how can I efficiently calculate l
?
I am open to any solution, thrust is not a necessity.
Here is one possible approach:
Mark the end of your (t-)segments. I assume that it's possible for an e-segment to have a single t-segment. If that's the case, then adjacent e-segments could have t-segments of the same numerical value (1 presumably). Therefore marking the end of segments needs to consider both
e
andt
. I use a method basically like adjacent difference, except it considers bothe
andt
usingthrust::transform
and shifted representations ofe
andt
.Determine the value that
f
will hold for each segment. Now that we know the end of each (t-)segment, we can simply pick the next value out ofi
(usingcopy_if
, and the segment end markers as our stencil) as thef
value for the preceding segment. To facilitate this, and since youri
is just an index sequence, I create ani
vector that is one element longer than what you have shown.Create a numerically increasing index for each segment. This is just an exclusive scan on the vector created in step 1.
Use the index sequence created in step 3, to "scatter" the
f
segment values created int step 2, into ourf
result ("scatter" is done withthrust::copy
and a permuation iterator).Here's a worked example, borrowing from your code:
A very similar sequence could be used to create the
l
vector.I found another way to do this.
In order to be able to use
lower_bound
, I needed to make sure thatt
is globally sorted. In order to do that, I first find out the starting points of each sub_segment usingadjacent_difference
. After that,scatter_if
copies increasing numbers from acounting_iterator
for each starting point of a subsegment. Finally,inclusive_scan
spreads same values for each subsegment. I combined the two steps before theinclusive_scan
into the custom functormy_scatter
to achieve better kernel fusing.Now
upper_bound
is applied to these globally increasing values to calculatef
.l
can be calculated by applyingupper_bound
one
.I am not sure how the efficiency of my approach compares to the approach presented by @RobertCrovella.
output: