thrust: fill isolate space

2020-08-05 10:49发布

问题:

I have an array like this:

0 0 0 1 0 0 0 0 5 0 0 3 0 0 0 8 0 0

I want every non-zero elements to expand themselves one element at a time until it reaches other non-zero elements, the result is like this:

1 1 1 1 1 1 5 5 5 5 3 3 3 3 8 8 8 8

Is there any way to do this using thrust?

回答1:

Is there any way to do this using thrust?

Yes, here is one possible approach.

  1. For each position in the sequence, compute 2 distances. The first is the distance to the nearest non-zero value in the left direction, and the second is the distance to the nearest non-zero value in the right direction. If the position itself is non-zero, both left and right distances will be computed as zero. Our basic engine for this will be segmented inclusive scans, one computed in the left to right direction (to compute the distance from the left for each zero segment), and the other computed in the reverse direction (to compute the distance from the right for each zero segment). Using your example:

    a vector:    0 0 0 1 0 0 0 0 5 0 0 3 0 0 0 8 0 0
    a left dist: ? ? ? 0 1 2 3 4 0 1 2 0 1 2 3 0 1 2
    a right dist:3 2 1 0 4 3 2 1 0 2 1 0 3 2 1 0 ? ?
    

    Note that in each distance computation, we must special-case one end if that end does not happen to begin with a non-zero value (because the distance from that direction is "undefined"). We will special case those ? distances by assigning them large values, the reason for which will become evident in the next step.

  2. We now will create a "map" vector, which, for each output position, allows us to select an element from the original input vector that belongs in that output position. This map vector is computed by taking the lesser of the two computed distances, and adjusting the index either from the left or the right, by that distance:

    output index: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
    a left dist:  ? ? ? 0 1 2 3 4 0 1  2  0  1  2  3  0  1  2
    a right dist: 3 2 1 0 4 3 2 1 0 2  1  0  3  2  1  0  ?  ?
    map vector:   3 3 3 3 3 3 8 8 8 8 11 11 11 11 15 15 15 15
    

    For the map vector computation, if a left dist > a right dist then we take the output index and add a right dist to it, to produce the map vector element at that position. Otherwise, we take the output index and subtract a left dist from it. Note that the special-case ? entries above should be considered to be "arbitrarily large" for this computation. This is simulated in the code by using a large integer (1<<30).

  3. Once we have the map vector, it's a trivial matter to use it to do a mapped copy from input to output vectors:

    a vector:    0 0 0 1 0 0 0 0 5 0  0  3  0  0  0  8  0  0
    map vector:  3 3 3 3 3 3 8 8 8 8 11 11 11 11 15 15 15 15
    out vector:  1 1 1 1 1 1 5 5 5 5  3  3  3  3  8  8  8  8
    

Here is a fully worked example:

$ cat t610.cu
#include <thrust/device_vector.h>
#include <thrust/copy.h>
#include <thrust/scan.h>
#include <thrust/iterator/permutation_iterator.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/functional.h>
#include <thrust/transform.h>
#include <thrust/sequence.h>
#include <iostream>
#define IVAL (1<<30)

// used to create input vector for prefix sums (distance vector computation)
struct is_zero {

  template <typename T>
  __host__ __device__
  T operator() (T val) {
    return (val) ? 0:1;
    }
};

// inc and dec help with special casing of left and right ends
struct inc {

  template <typename T>
  __host__ __device__
  T operator() (T val) {
    return val+IVAL;
    }
};

struct dec {

  template <typename T>
  __host__ __device__
  T operator() (T val) {
    return val-IVAL;
    }
};

// this functor is lifted from thrust example code
// and is used to enable segmented scans based on flag delimitors
// BinaryPredicate for the head flag segment representation
// equivalent to thrust::not2(thrust::project2nd<int,int>()));
template <typename HeadFlagType>
struct head_flag_predicate : public thrust::binary_function<HeadFlagType,HeadFlagType,bool>
{
    __host__ __device__
    bool operator()(HeadFlagType left, HeadFlagType right) const
    {
        return !right;
    }
};

// distance tuple ordering is left (0), then right (1)
struct map_functor
{
  template <typename T>
  __host__ __device__
  int operator() (T dist){
    int leftdist =  thrust::get<0>(dist);
    int rightdist = thrust::get<1>(dist);
    int idx =       thrust::get<2>(dist);
    return (leftdist > rightdist) ? (idx+rightdist):(idx-leftdist);
    }
};

int main(){

  int h_a[] = { 0, 0, 0, 1, 0, 0, 0, 0, 5, 0, 0, 3, 0, 0, 0, 8, 0, 0 };
  int n = sizeof(h_a)/sizeof(h_a[0]);
  thrust::device_vector<int> a(h_a, h_a+n);
  thrust::device_vector<int> az(n);
  thrust::device_vector<int> asl(n);
  thrust::device_vector<int> asr(n);
  thrust::transform(a.begin(), a.end(), az.begin(), is_zero());
  // set up distance from the  left vector (asl)
  thrust::transform_if(az.begin(), az.begin()+1, a.begin(), az.begin(),inc(), is_zero());
  thrust::transform(a.begin(), a.begin()+1, a.begin(), inc());
  thrust::inclusive_scan_by_key(a.begin(), a.end(), az.begin(), asl.begin(), head_flag_predicate<int>());
  thrust::transform(a.begin(), a.begin()+1, a.begin(), dec());
  thrust::transform_if(az.begin(), az.begin()+1, a.begin(), az.begin(), dec(), is_zero());
  // set up distance from the right vector (asr)
  thrust::device_vector<int> ra(n);
  thrust::sequence(ra.begin(), ra.end(), n-1, -1);
  thrust::transform_if(az.end()-1, az.end(), a.end()-1, az.end()-1, inc(), is_zero());
  thrust::transform(a.end()-1, a.end(), a.end()-1, inc());
  thrust::inclusive_scan_by_key(thrust::make_permutation_iterator(a.begin(), ra.begin()), thrust::make_permutation_iterator(a.begin(), ra.end()), thrust::make_permutation_iterator(az.begin(), ra.begin()), thrust::make_permutation_iterator(asr.begin(), ra.begin()), head_flag_predicate<int>());
  thrust::transform(a.end()-1, a.end(), a.end()-1, dec());
  // create combined map vector
  thrust::device_vector<int> map(n);
  thrust::counting_iterator<int> idxbegin(0);
  thrust::transform(thrust::make_zip_iterator(thrust::make_tuple(asl.begin(), asr.begin(), idxbegin)), thrust::make_zip_iterator(thrust::make_tuple(asl.end(), asr.end(), idxbegin+n)),  map.begin(), map_functor());
  // use map to create output
  thrust::device_vector<int> result(n);
  thrust::copy(thrust::make_permutation_iterator(a.begin(), map.begin()), thrust::make_permutation_iterator(a.begin(), map.end()), result.begin());
  // display results
  std::cout << "Input vector:" << std::endl;
  thrust::copy(a.begin(), a.end(), std::ostream_iterator<int>(std::cout, " "));
  std::cout << std::endl;
  std::cout << "Output vector:" << std::endl;
  thrust::copy(result.begin(), result.end(), std::ostream_iterator<int>(std::cout, " "));
  std::cout << std::endl;
}
$ nvcc -arch=sm_20 -o t610 t610.cu
$ ./t610
Input vector:
0 0 0 1 0 0 0 0 5 0 0 3 0 0 0 8 0 0
Output vector:
1 1 1 1 1 1 5 5 5 5 3 3 3 3 8 8 8 8
$

Notes:

  1. The above implementation probably has areas that can be improved on, particularly with respect to fusion of operations. However, for understanding purposes, I think fusion makes the code a bit harder to read.

  2. I have really only tested it on the particular example you gave. There may be bugs that you will uncover. My purpose is not to give you a black-box library function that you use but don't understand, but rather to teach you how to write your own code that does what you want.

  3. The "ambiguity" pointed out by JackOLantern is still present in your problem statement. I have obscured it by choosing my map functor behavior to mimic the output you indicated as desired, but simply by creating an equally valid but opposite realization of the map functor (using "if a left dist < a right dist then ..." instead) I can cause the result between 3 and 8 to take the other possible outcome/state. Your comment that "if there is an ambiguity, whoever reaches the position first fill its value to that space" makes no sense to me, unless by that you mean "I don't care which outcome you provide." There is no concept of a particular thread reaching a particular point first. Threads (and blocks) can execute in any order, and this order can change from device to device, and run to run.