Understanding the usage of OpenCL in OpenCV (Mat/

I ran the code below to check for the performance difference between GPU and CPU usage. I am calculating the Average time for cv::cvtColor() function. I make four function calls:

Just_mat()(Without using OpenCL for Mat object)
Just_UMat()(Without using OpenCL for Umat object)
OpenCL_Mat()(using OpenCL for Mat object)
OpenCL_UMat() (using OpenCL for UMat object)

for both CPU and GPU.
I did not find a huge performance difference between GPU and CPU usage.

int main(int argc, char* argv[])
{
    loc = argv[1];
    just_mat(loc);// Calling function Without OpenCL 
    just_umat(loc);//Calling function Without OpenCL 
    cv::ocl::Context context;
    std::vector<cv::ocl::PlatformInfo> platforms;
    cv::ocl::getPlatfomsInfo(platforms);
    for (size_t i = 0; i < platforms.size(); i++)
    {
        //Access to Platform
        const cv::ocl::PlatformInfo* platform = &platforms[i];

        //Platform Name
        std::cout << "Platform Name: " << platform->name().c_str() << "\n" << endl;

        //Access Device within Platform
        cv::ocl::Device current_device;
        for (int j = 0; j < platform->deviceNumber(); j++)
        {
            //Access Device
            platform->getDevice(current_device, j);
            int deviceType = current_device.type();
            cout << "Device name:  " << current_device.name() << endl;
            if (deviceType == 2)
                cout << context.ndevices() << " CPU devices are detected." << std::endl;
            if (deviceType == 4)
                cout << context.ndevices() << " GPU devices are detected." << std::endl;
            cout << "===============================================" << endl << endl;
            switch (deviceType) 
            {
            case (1 << 1):
                cout << "CPU device\n";
                if (context.create(deviceType))
                    opencl_mat(loc);//With OpenCL Mat
                break;
            case (1 << 2):
                cout << "GPU device\n";              
                if (context.create(deviceType))
                    opencl_mat(loc);//With OpenCL UMat
                break;
            }
            cin.ignore(1);
        }
    }
    return 0;
}
int just_mat(string loc);// I check for the average time taken for cvtColor() without using OpenCl
int just_umat(string loc);// I check for the average time taken for cvtColor() without using OpenCl
int opencl_mat(string loc);//ocl::setUseOpenCL(true); and check for time difference for cvtColor function
int opencl_umat(string loc);//ocl::setUseOpenCL(true); and check for time difference for cvtColor function

The output(in miliseconds) for the above code is
__________________________________________
|GPU Name|With OpenCL Mat | With OpenCl UMat|
|_________________________________________|
|--Carrizo---|------7.69052 ------ |------0.247069-------|
|_________________________________________|
|---Island--- |-------7.12455------ |------0.233345-------|
|_________________________________________|

__________________________________________
|----CPU---|With OpenCL Mat | With OpenCl UMat |
|_________________________________________|
|---AMD---|------6.76169 ------ |--------0.231103--------|
|_________________________________________|

________________________________________________
|----CPU---| WithOut OpenCL Mat | WithOut OpenCl UMat |
|_______________________________________________|
|----AMD---|------7.15959------ |------------0.246138------------ |
|_______________________________________________|

In code, using Mat Object always runs on CPU & using UMat Object always runs on GPU, irrespective of the code ocl::setUseOpenCL(true/false);
Can anybody explain the reason for all output time variation?

One more question, i didn't use any OpenCL specific .dll with .exe file and yet GPU was used without any error, while building OpenCV with Cmake i checked With_OpenCL did this built all OpenCL required function within opencv_World310.dll ?

In code, using Mat Object always runs on CPU & using UMat Object always runs on GPU, irrespective of the code ocl::setUseOpenCL(true/false);

I'm sorry, because I'm not sure if this is a question or a statement... in either case it's partially true. In 3.0, for the UMat, if you don't have a dedicated GPU then OpenCV just runs everything on the CPU. If you specifically ask for Mat you get it on the CPU. And in your case you have directed both to run on each of your GPUs/CPU by selecting each specifically (more on "choosing a CPU below)... read this:

Few design choices support the new architecture:

A unified abstraction cv::UMat that enables the same APIs to be implemented using CPU or OpenCL code, without a requirement to call OpenCL accelerated version explicitly. These functions use an OpenCL -enabled GPU if exists in the system, and automatically switch to CPU operation otherwise.

The UMat abstraction enables functions to be called asynchronously. Unlike the cv::Mat of the OpenCV version 2.x, access to the underlyi ng data for the cv::UMat is performed through a method of class, and not though its data member. Such an approach enables the implementation to explicitly wait for GPU completion only when CPU code absolutely needs the result.

The UMat implementation makes use of CPU-GPU shared physical memory available on Intel SoCs, including allocations that come from pointers passed into OpenCV.

I think there also might be a misunderstanding about "using OpenCL". When you use an UMat, you are specifically trying to use the GPU. And, I'll plead some ignorance here, as a result I believe that CV is using some of the CL library to make that happen automatically... as a side in 2.X we had cv::ocl to specifically/manually do this, so be careful if you are using that 2.X legacy code in 3.X. There are reasons to do it, but they are not always straightforward. But, back on topic, when you say,

with OpenCL UMat

you are potentially being redundant. The CL code you have in your snippet is basically finding out what equipment is installed, how many there are, what their names are, and choosing which to use... I'd have to dig through the way it is instantiated, but perhaps when you make it UMat it automatically sets OpenCL to True? (link) That would definitely support the data you presented. You could probably test that idea by checking what the state of ocl::setUseOpenCL after you set it to false and then use an UMat.

Finally, I'm guessing your CPU has a built in GPU. So it is running parallel processing with OpenCL and not paying a time penalty to travel to the seperate/dedicated GPU and back, hence your perceived performance increase over the GPUs (since it is not technically the CPU running it)... only when you are specifically using the Mat is the CPU only being used.

Your last question, I'm not sure... this is my speculation: OpenCL architexture exists on the GPU, when you install CV with CL you are installing the link between the two libraries and associated header files. I'm not sure which dll files you need to make that magic happen.