I am trying to implement a brute force distance computation algorithm in CUDA.
#define VECTOR_DIM 128
thrust::device_vector<float> feature_data_1;
feature_data_1.resize(VECTOR_DIM * 1000); // 1000 128 dimensional points
thrust::device_vector<float> feature_data_2;
feature_data_2.resize(VECTOR_DIM * 2000); // 2000 128 dimensional points
Now what I would like to do is to compute the L2
distances (sum of the squared differences) from every vector in the first matrix to every vector in the second matrix.
So, if array 1
is of size 1000
and array 2
is of size 2000
, the result would be a floating point matrix of 1000*2000
in size.
I was wondering if there is a way to achieve this using Thrust algorithms alone.
Calculating the all-pairs distances between points in two different sets in CUDA can be solved by observing that
where
|| ||
is thel2
norm and<x,y>
denotes the scalar product betweenx
andy
.The norms
||x||
and||y||
can be calculated by approaches inspired by Reduce matrix rows with CUDA, while the scalar products<x,y>
can then be calculated as the matrix-matrix multiplicationX*Y^T
usingcublas<t>gemm()
.Below is a fully worked out implementation. Please, note that for the calculation of the norms
|| ||
two approaches are reported, one usingcuBLAS
cublas<t>gemv
and one using Thurst'stransform
. For the problem size of your interest, I have experienced the following timings on my GT540M card:The
Utilities.cu
andUtilities.cuh
files are mantained here and omitted here. TheTimingGPU.cu
andTimingGPU.cuh
are maintained here and are omitted as well.