We are currently trying to optimize a system in which there are at least 12 variables. Total comibination of these variable is over 1 billion. This is not deep learning or machine learning or Tensorflow or whatsoever but arbitrary calculation on time series data.
We have implemented our code in Python and successfully run it on CPU. We also tried multiprocessing which also works well but we need faster computation since calculation takes weeks. We have a GPU system consisting of 6 AMD GPUs. We would like to run our code on this GPU system but do not know how to do so.
My questions are:
- Can we run our simple Python code on my AMD supported laptop?
- Can we run the same app on our GPU system?
We read that we need to adjust the code for GPU computation but we do not know how to do that.
PS: I can add more information if you need. I tried to keep the post as simple as possible to avoid conflict.
There are at least two options to speed up calculations using the GPU:
But I usually don't recommend to run code on the GPU from the start. Calculations on the GPU are not always faster. Depending on how complex they are and how good your implementations on the CPU and GPU are. If you follow the list below you can get a good idea on what to expect.
If your code is pure Python (list, float, for-loops etc.) you can see a a huge speed-up (maybe up to 100 x) by using vectorized Numpy code. This is also an important step to find out how your GPU code could be implemented as the calculations in vectorized Numpy will have a similar scheme. The GPU performs better at small tasks that can be parallelized.
Once you have a well optimized Numpy example you can try to get a first peek on the GPU speed-up by using Numba. For simple cases you can just decorate your Numpy functions to run on the GPU. You can expect a speed-up of 100 to 500 compared to Numpy code, if your problem can be parallelized / vectorized.
You may have gotten so far without writing any OpenCL C code for the GPU but still have your code running on it. But if your problem is too complex, you will have to write custom code and run it using PyOpenCL. Expected speed-up is also 100 to 500 compared to good Numpy code.
The important thing to rembemer is that the GPU is only powerful if you use it correctly and only for a certain set of problems.
If you have a small example of your code feel free to post it.
Another thing to say is that CUDA is often easier to use than OpenCL. There are more libraries, more examples, more documentation, more support. Nvidia did a very good job on not supporting OpenCL well from the very start. I usually perfer open standards, but we moved to CUDA and Nvidia hardware quickly when things became business and commercial.