Following from here.
Assuming that dev_X is a vector.
int * X = (int*) malloc( ThreadsPerBlockX * BlocksPerGridX * sizeof(*X) );
for ( int i = 0; i < ThreadsPerBlockX * BlocksPerGridX; i++ )
X[ i ] = i;
// create device vectors
thrust::device_vector<int> dev_X ( ThreadsPerBlockX * BlocksPerGridX );
//copy to device
thrust::copy( X , X + theThreadsPerBlockX * theBlocksPerGridX , dev_X.begin() );
The following is making a subtraction:
thrust::transform( dev_Kx.begin(), dev_Kx.end(), dev_X.begin() , distX.begin() , thrust::minus<float>() );
dev_Kx - dev_X.
I want to use the whole dev_Kx
vector ( as it is used because it goes from .begin to .end() ) and the whole dev_X
vector.
The above code uses dev_X.begin()
.
Is that meaning that it will use the whole dev_X
vector? Starting from the beginning?
Or I have to use another extra argument to point to the dev_X.end()
? ( because in the above function call I can't just use this extra argument )
Also , for example:
If I want to use
thrust::transform( dev_Kx, dev_Kx + i , dev_X.begin() ,distX.begin() , thrust::minus<int>() );
Then dev_Kx
would go from 0 to i and the dev_X.begin()
? It will use the same length? (0 to i?) Or it will use the length of dev_X
?
Many thrust
(and standard library) functions take a range as a first parameter and then assume all other iterators are backed by containers of the same size. A range is a pair of iterators indicating the beginning and end of a sequence.
For example:
thrust::copy(
X.begin(), // begin input iterator
X.end(), // end input iterator
dev_X.begin() // begin output iterator
);
This copies the entire contents of X
into dev_X
. Why is dev_X.end()
not needed? Because thrust
requires that you, the programmer, take the care of properly sizing dev_X
to be able to contain at least as many elements as there are in the input range. If you don't meet that guarantee, then the behavior is undefined.
When you do this:
thrust::transform(
dev_Kx.begin(), // begin input (1) iterator
dev_Kx.end(), // end input (1) iterator
dev_X.begin(), // begin input (2) iterator
distX.begin(), // output iterator
thrust::minus<float>()
);
What thrust
sees is an input range from dev_Kx.begin()
to dev_Kx.end()
. It has an explicit size of dev_Kx.end() - dev_Kx.begin()
. Why are dev_X.end()
and distX.end()
not needed? Because they have an implicit size of dev_Kx.end() - dev_Kx.begin()
too. For example, if there are 10 elements in dev_Kx
, then transform
will:
- Use the 10 elements of
dev_Kx
- Use 10 elements of
dev_X
(which must hold at least 10 elements)
- Perform the substraction and store the 10 results in
distX
, which must be able to hold at least 10 elements.
Maybe looking at the implementation would clear up any doubts. Here's some pseudo code:
void transform(InputIterator input1_begin, InputIterator input1_end,
InputIterator input2_begin, OutputIterator output,
BinaryFunction op) {
while (input1_begin != input1_end) {
*output++ = op(*input1_begin++, *input2_begin++);
}
}
Notice how only one end iterator is needed.
On an unrelated note, the following:
int * X = (int*) malloc( ThreadsPerBlockX * BlocksPerGridX * sizeof(*X) );
for ( int i = 0; i < ThreadsPerBlockX * BlocksPerGridX; i++ )
X[ i ] = i;
Could be rewritten in more idiomatic, less error-prone C++ to:
std::vector<int> X(ThreadsPerBlockX * BlocksPerGridX);
std::iota(X.begin(), X.end(), 0);