Use cmake to configure cuda project for vs2013 and

2019-09-18 23:32发布

I use the cmake gui tool to configure my cuda project in vs2013. CMakeLists.txt is as below:

project(CUDA_PART)
# required cmake version
cmake_minimum_required(VERSION 3.0)

include_directories(${CUDA_PART_SOURCE_DIR}/common)
# packages
find_package(CUDA REQUIRED)

# nvcc flags
set(CUDA_NVCC_FLAGS -gencode arch=compute_20,code=sm_20;-G;-g)
set(CUDA_VERBOSE_BUILD ON)

#FILE(GLOB SOURCES "*.cu" "*.cpp" "*.c" "*.h")
CUDA_ADD_EXECUTABLE(CUDA_PART hist_gpu_shmem_atomics.cu)

The .cu file is from Cuda by example source code hist_gpu_shmem_atomics.cu

There are two problems:

  1. After the line histo_kernel <<<blocks * 2, 256 >>>(dev_buffer, SIZE, dev_histo);an "invalid device function" error occurs.

  2. When I use the CUDA debugging tool to debug, its cannot trigger breakpoints in the device code.

But when I create a project with the same code by the cuda project temple in visual studio 2013.It works correctly!

So, is there something wrong in the CMakeLists.txt ?

OS: Win7 64bit;GPU: GTX960;CUDA: CUDA 7.5;VS: 2013 (and also 2010)


When I use set the "Code Generation" in vs2013 as follow : the properties of a project use cuda project temple

The CUDA_NVCC_FLAGES turns out to be -gencode=arch=compute_20,code=\"sm_20,compute_20\" It equals to:

-gencode=arch=compute_20,code=sm_20 \
-gencode=arch=compute_20,code=compute_20 

So, I guess it will generate 2 versions machine code: the first one(SASS) with virtual and real architectures and the second one(PTX) with only virtual architecture. Since my GTX960 is a cc5.2 device, it chooses the second one (PTX) and convert it to a suitable SASS.

1条回答
迷人小祖宗
2楼-- · 2019-09-19 00:04

This is a problem:

set(CUDA_NVCC_FLAGS -gencode arch=compute_20,code=sm_20;-G;-g)

Those flags will cause nvcc to generate SASS code (only) for a cc 2.0 device (only). Such cc2.0 SASS code will not run on your cc5.2 device (GTX960). "Invalid device function" is exactly the error you would get when trying to launch a kernel in such a scenario. Since the kernel will never launch, trying to hit breakpoints in device code won't work.

I'm not a CMake expert, so there might be other, more sensible approaches, but one possible way to try to fix this might be:

set(CUDA_NVCC_FLAGS -gencode arch=compute_52,code=sm_52;-G;-g)

which should generate code for your cc5.2 device. There are undoubtedly other possible settings here, you may want to read this or the nvcc manual for more background on compile options to target specific devices.

Also note that -G generates device debug code, which is fine if that is what you want. However it will generally run slower than code compiled without that switch. If you want to debug, however, that switch is necessary.

查看更多
登录 后发表回答