proper thrust call for subtraction

2019-07-26 00:21发布

问题:

Following from here.

Assuming that dev_X is a vector.

int * X = (int*) malloc( ThreadsPerBlockX * BlocksPerGridX * sizeof(*X) );


for ( int i = 0; i < ThreadsPerBlockX * BlocksPerGridX; i++ )
    X[ i ] = i;

// create device vectors
thrust::device_vector<int> dev_X ( ThreadsPerBlockX * BlocksPerGridX );

//copy to device
thrust::copy( X , X + theThreadsPerBlockX * theBlocksPerGridX , dev_X.begin() );

The following is making a subtraction:

   thrust::transform( dev_Kx.begin(), dev_Kx.end(), dev_X.begin() , distX.begin() , thrust::minus<float>() );
dev_Kx - dev_X.

I want to use the whole dev_Kx vector ( as it is used because it goes from .begin to .end() ) and the whole dev_X vector.

The above code uses dev_X.begin().

Is that meaning that it will use the whole dev_X vector? Starting from the beginning? Or I have to use another extra argument to point to the dev_X.end()? ( because in the above function call I can't just use this extra argument )

Also , for example:

If I want to use

thrust::transform( dev_Kx, dev_Kx + i , dev_X.begin() ,distX.begin() , thrust::minus<int>() ); 

Then dev_Kx would go from 0 to i and the dev_X.begin()? It will use the same length? (0 to i?) Or it will use the length of dev_X?

回答1:

Many thrust (and standard library) functions take a range as a first parameter and then assume all other iterators are backed by containers of the same size. A range is a pair of iterators indicating the beginning and end of a sequence.

For example:

thrust::copy(
    X.begin(),    // begin input iterator
    X.end(),      // end input iterator
    dev_X.begin() // begin output iterator
);

This copies the entire contents of X into dev_X. Why is dev_X.end() not needed? Because thrust requires that you, the programmer, take the care of properly sizing dev_X to be able to contain at least as many elements as there are in the input range. If you don't meet that guarantee, then the behavior is undefined.

When you do this:

thrust::transform(
    dev_Kx.begin(), // begin input (1) iterator
    dev_Kx.end(),   // end input (1) iterator
    dev_X.begin(),  // begin input (2) iterator
    distX.begin(),  // output iterator
    thrust::minus<float>()
);

What thrust sees is an input range from dev_Kx.begin() to dev_Kx.end(). It has an explicit size of dev_Kx.end() - dev_Kx.begin(). Why are dev_X.end() and distX.end() not needed? Because they have an implicit size of dev_Kx.end() - dev_Kx.begin() too. For example, if there are 10 elements in dev_Kx, then transform will:

  • Use the 10 elements of dev_Kx
  • Use 10 elements of dev_X (which must hold at least 10 elements)
  • Perform the substraction and store the 10 results in distX, which must be able to hold at least 10 elements.

Maybe looking at the implementation would clear up any doubts. Here's some pseudo code:

void transform(InputIterator input1_begin, InputIterator input1_end,
               InputIterator input2_begin, OutputIterator output,
               BinaryFunction op) {
    while (input1_begin != input1_end) {
        *output++ = op(*input1_begin++, *input2_begin++);
    }
}

Notice how only one end iterator is needed.


On an unrelated note, the following:

int * X = (int*) malloc( ThreadsPerBlockX * BlocksPerGridX * sizeof(*X) );
for ( int i = 0; i < ThreadsPerBlockX * BlocksPerGridX; i++ )
    X[ i ] = i;

Could be rewritten in more idiomatic, less error-prone C++ to:

std::vector<int> X(ThreadsPerBlockX * BlocksPerGridX);
std::iota(X.begin(), X.end(), 0);


标签: cuda thrust