cudaMemcpyToSymbol performance

2019-02-27 18:10发布

I have some functions that load a variable in constant device memory and launch a kernel function. I noticed that the first time that one function load a variable in constant memory takes 0.6 seconds but the next loads on constant memory are very fast(0.0008 seconds). This behaviour occours regardless of which function is the first in the main. Below an example code:

        __constant__ double res1;

        __global__kernel1(...) {...}

        void function1() {
            double resHost = 255 / ((double) size);
            CUDA_CHECK_RETURN(cudaMemcpyToSymbol(res1, &resHost, sizeof(double)));


            //prepare and launch kernel
        }

        __constant__ double res2;

        __global__kernel2(...) {...}

        void function2() {
            double resHost = 255 / ((double) size);
            CUDA_CHECK_RETURN(cudaMemcpyToSymbol(res2, &resHost, sizeof(double)));


            //prepare and launch kernel
        }

        int main(){
            function1(); //takes 0.6 seconds for loading
            function2(); // takes 0.0008 seconds for loading
            function1(); //takes 0.0008 seconds for loading

            return 0;
        }

Why is this happening? Can I avoid it?

标签： cuda gpu-programming gpu-constant-memory

1条回答

地球回转人心会变

2楼-- · 2019-02-27 18:54

Why is this happening?

Lazy runtime API context establishment and setup.

Can I avoid it?

No. The first runtime API call to require a context will incur significant setup latency, in your case that is the first cudaMemcpyToSymbol call.

0人赞添加讨论(0) 举报

cudaMemcpyToSymbol performance

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间