I have a code given by @m.s.:
#include <thrust/device_vector.h>
#include <thrust/scan.h>
#include <thrust/iterator/transform_iterator.h>
#include <thrust/iterator/counting_iterator.h>
#include <iostream>
struct omit_negative : public thrust::unary_function<int, int>
{
__host__ __device__
int operator()(int value)
{
if (value<0)
{
value = 0;
}
return value;
}
};
int main()
{
int array[] = {2,1,-1,3,-1,2};
const int array_size = sizeof(array)/sizeof(array[0]);
thrust::device_vector<int> d_array(array, array + array_size);
thrust::device_vector<int> d_result(array_size);
std::cout << "input data" << std::endl;
thrust::copy(d_array.begin(), d_array.end(), std::ostream_iterator<int>(std::cout, " "));
thrust::inclusive_scan(thrust::make_transform_iterator(d_array.begin(), omit_negative()),
thrust::make_transform_iterator(d_array.end(), omit_negative()),
d_result.begin());
std::cout << std::endl << "after inclusive_scan" << std::endl;
thrust::copy(d_result.begin(), d_result.end(), std::ostream_iterator<int>(std::cout, " "));
using namespace thrust::placeholders;
thrust::scatter_if(d_array.begin(),
d_array.end(),
thrust::make_counting_iterator(0),
d_array.begin(),
d_result.begin(),
_1<0
);
std::cout << std::endl << "after scatter_if" << std::endl;
thrust::copy(d_result.begin(), d_result.end(), std::ostream_iterator<int>(std::cout, " "));
std::cout << std::endl;
}
It refers to previous question.
I didn't know about thrust, but now I guess I'm going to quit idea of writing own code. I'd rather use thrust. I modified my algorithm: instead -1 there are 0's (so make_transform is not necessary). Also your example creates array on host. But actually I have prepared array stored on device, and I' like to use it (instead of vectors) to avoid creating redundant memory and to avoid copying memory (it costs time - minimal time cost is my goal). I'm not sure how to use arrays instead of vectors. Here is what I've written:
int* dev_l_set = 0;
cudaMalloc((void**)&dev_l_set, actualVerticesRowCount * sizeof(int));
...prepare array in kernel...
thrust::device_vector<int> d_result(actualVerticesRowCount);
thrust::inclusive_scan(dev_l_set, dev_l_set + actualVerticesRowCount, dev_l_set);
using namespace thrust::placeholders;
thrust::scatter_if(dev_l_set, dev_l_set + actualVerticesRowCount, thrust::make_counting_iterator(0), dev_l_set, d_result.begin(), _1 <= 0);
cudaFree(dev_l_set);
dev_l_set = thrust::raw_pointer_cast(d_result.data());
I can't cast from device_vector to int*, but I'd like to store result of scanning in initial dev_l_set
array. Also it'd be great to do it in place, is it necessary to use d_result
in scatter_if?
Actual Input (stored on int* - device side): (example)
dev_l_set[0] = 0
dev_l_set[1] = 2
dev_l_set[2] = 0
dev_l_set[3] = 3
dev_l_set[4] = 0
dev_l_set[5] = 1
Desired output to the above input:
dev_l_set[0] = 0
dev_l_set[1] = 2
dev_l_set[2] = 0
dev_l_set[3] = 5
dev_l_set[4] = 0
dev_l_set[5] = 6
dev_l_set
should store input, then do scan in place and in the end it should store output.
It could be something like this.
int* dev_l_set = 0;
cudaMalloc((void**)&dev_l_set, actualVerticesRowCount * sizeof(int));
...prepare array in kernel... (see input data)
thrust::inclusive_scan(dev_l_set, dev_l_set + actualVerticesRowCount, dev_l_set);
using namespace thrust::placeholders;
thrust::scatter_if(dev_l_set, dev_l_set + actualVerticesRowCount, thrust::make_counting_iterator(0), dev_l_set, dev_l_set, _1 <= 0);
My Cuda version (minimal that app should work) is 5.5 (Tesla M2070) and unfortunately I can't use c++11.
You can do the inclusive scan as well as the scatter step in place without an additional result vector.
The following example directly uses the data from a raw device pointer without
thrust::device_vector
. After the inclusive scan, the previously0
elements are restored.As @JaredHoberock pointed out, one should not rely on code residing in
thrust::detail
. I therefore edited my answer and copied part of the code fromthrust::detail::head_flags
directly into this example.output