CUDA: Why does compute_20 code fail on compute_35

2019-08-02 07:54发布

For a computer with Titan GPU (compute_35,sm_35), I compiled some code using this line in CMakeLists.txt:

set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS};-gencode arch=compute_35,code=sm_35)

The code compiles and also runs fine.

I wanted to check what compilation problems this code would cause for a friend who uses a GTS 450 (compute_20,sm_21). So, I changed the above line to:

set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS};-gencode arch=compute_20,code=sm_21)

The code compiles without any errors on my computer with Titan. But when I run it (again on my Titan computer), its fails after a thrust::copy call with the following error:

$ ./foobar
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  invalid device function 
"foobar" terminated by signal SIGABRT (Abort)

Google says the above error is caused due to GPU architecture mismatch.

The strangest part is that with the above line (arch=compute_20,code=sm_21), the code compiles and runs without error on my friend's computer with GTS 450! Except for the GPU, her Ubuntu 12.04, gcc and CUDA SDK 5.5 versions are the same as mine.

Is this the real cause of this error? Why cannot Titan run compute_20 code? Isn't a CUDA GPU supposed to be backwards compatible with PTX or SASS code? Even if it isn't, why cannot the driver JIT compile the compute_20 PTX to the SASS of sm_35?

标签: cuda cmake
1条回答
爱情/是我丢掉的垃圾
2楼-- · 2019-08-02 08:29

If you specify:

-gencode arch=compute_20,code=compute_20

your code should run (via JIT) on either GPU.

According to the nvcc manual, JIT is directly enabled when you specify a virtual architecture for the code switch. You can make multiple specifications in a single command:

-arch=compute_20 -code=compute20,sm_21,sm_35

(note this is in lieu of specifying -gencode ...)

which would allow JIT from sm_20 PTX, and non-JIT execution directly on cc2.1 or cc3.5 devices.

查看更多
登录 后发表回答