I have the following trivial thrust::gather program (taken directly from the thrust::gather documentation)
#include <thrust/gather.h>
#include <thrust/device_vector.h>
int main(void)
{
// mark even indices with a 1; odd indices with a 0
int values[10] = {1, 0, 1, 0, 1, 0, 1, 0, 1, 0};
thrust::device_vector<int> d_values(values, values + 10);
// gather all even indices into the first half of the range
// and odd indices to the last half of the range
int map[10] = {0, 2, 4, 6, 8, 1, 3, 5, 7, 9};
thrust::device_vector<int> d_map(map, map + 10);
thrust::device_vector<int> d_output(10);
thrust::gather(d_map.begin(), d_map.end(),
d_values.begin(),
d_output.begin());
// d_output is now {1, 1, 1, 1, 1, 0, 0, 0, 0, 0}
return 0;
}
I compile this with
/usr/local/cuda/bin/nvcc -ccbin g++ -I../../common/inc -m64 -g -G -gencode arch=compute_30,code=sm_30 -o thrustGather.o -c thrustGather.cu
/usr/local/cuda/bin/nvcc -ccbin g++ -m64 -g -G -o thrustGather thrustGather.o
Next I try running this simple program after first attaching it to cuda-gdb:
>cuda-gdb ./thrustGather
NVIDIA (R) CUDA Debugger
5.5 release
Portions Copyright (C) 2007-2013 NVIDIA Corporation
GNU gdb (GDB) 7.2
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/local/cuda-5.5/samples/0_Simple/thrustGatherRjm/thrustGather...done.
(cuda-gdb) run
Starting program: /usr/local/cuda-5.5/samples/0_Simple/thrustGatherRjm/thrustGather
[Thread debugging using libthread_db enabled]
[New Thread 0x7ffff7272700 (LWP 50318)]
[Context Create of context 0x78d790 on Device 0]
[Launch of CUDA Kernel 0 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 1 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 2 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 3 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 4 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 5 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 6 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 7 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
Error: received unexpected signal: Segmentation fault
BACKTRACE (41 frames):
cuda-gdb[0x4394e1]
/lib64/libc.so.6[0x3d96635a90]
cuda-gdb[0x5b038b]
cuda-gdb[0x55aae8]
cuda-gdb[0x55ed65]
cuda-gdb[0x55fc51]
cuda-gdb[0x55ec22]
cuda-gdb[0x5609fe]
cuda-gdb[0x5607bd]
cuda-gdb[0x560c36]
cuda-gdb[0x4f7e44]
cuda-gdb[0x4f8038]
cuda-gdb[0x4fde3c]
cuda-gdb[0x5c9f66]
cuda-gdb[0x429c3c]
cuda-gdb[0x5ca4e5]
cuda-gdb[0x5cab5e]
cuda-gdb[0x4296e6]
cuda-gdb[0x479366]
cuda-gdb[0x53addd]
cuda-gdb[0x5129c0]
cuda-gdb[0x5134fd]
cuda-gdb[0x51369d]
cuda-gdb[0x5091e7]
cuda-gdb[0x40f65d]
cuda-gdb[0x522f54]
cuda-gdb[0x523a20]
cuda-gdb[0x5ff9aa]
cuda-gdb[0x522fb9]
cuda-gdb[0x521b81]
cuda-gdb[0x522b1e]
cuda-gdb[0x51d0cb]
cuda-gdb[0x4ae816]
cuda-gdb[0x406429]
cuda-gdb[0x51d0cb]
cuda-gdb[0x406b76]
cuda-gdb[0x51d0cb]
cuda-gdb[0x406204]
cuda-gdb[0x4061d6]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x3d96621b75]
cuda-gdb[0x4060e9]
[Termination of CUDA Kernel 7 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 6 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 5 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 4 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 3 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 2 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 1 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 0 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
Note that cuda-gdb, itself, segfaults. I also ran the correpsonding expamples for
- thrust::gather_if
- thrust::scatter
- thrust::count
- thrust::inclusive_scan
- thrust::sort_by_key
- thrust::reduce_by_key
And of these, only the last three (inclusive_scan, sort_by_key, reduce_by_key), work (i.e., do not crash cuda-gdb).
This must be an issue with the latest release (5.5) of thrust and/or cuda-gdb, because I have run the same tests with release 5.0 with no issues whatsoever.
Here is some info about my setup:
> cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 319.21 Sat May 11 23:51:00 PDT 2013
GCC version: gcc version 4.8.1 20130603 (Red Hat 4.8.1-1) (GCC)
> cat /proc/version
Linux version 3.9.9-302.fc19.x86_64 (mockbuild@bkernel01.phx2.fedoraproject.org) (gcc version 4.8.1 20130603 (Red Hat 4.8.1-1) (GCC) ) #1 SMP Sat Jul 6 13:41:07 UTC 2013
> gcc --version
gcc (GCC) 4.8.1 20130603 (Red Hat 4.8.1-1)
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> lspci | grep NVIDIA
05:00.0 3D controller: NVIDIA Corporation GK104 [GeForce GTX 690] (rev a1)
05:00.1 Audio device: NVIDIA Corporation GK104 HDMI Audio Controller (rev a1)
06:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 690] (rev a1)
06:00.1 Audio device: NVIDIA Corporation GK104 HDMI Audio Controller (rev a1)
As talonmies indicated, the problem is the Thrust libraries do not run correctly when built with debugging. In my application I have a rather complicated .cu file containing several of my own CUDA kernels, along with multiple Thrust calls. If I were to compile this file with the -g -G debug flags and run inside cuda-gdb, it would crash -- making it impossible for me to debug my kernels.
Since I do not care about debugging the Thrust calls themselves (only my kernels), my solution involved putting all my thrust calls in another file thrustWrappers.cu and compiling this file without debugging. Then in my main .cu file I would replace the calls to thrust with associated wrapper functions (defined in thrustWrappers). For example,
became
Then I would link the two resulting object files together.