What future does the GPU have in computing? [close

2019-03-08 20:22发布

Your CPU may be a quad-core, but did you know that some graphics cards today have over 200 cores? We've already seen what GPU's in today's graphics cards can do when it comes to graphics. Now they can be used for non-graphical tasks as well, and in my opinion the results are nothing short of amazing. An algorithm that lends itself well to parallelism has the potential to be much, much faster on a GPU than it could ever be on a CPU.

There are a few technologies that make all of this possible:

1.) CUDA by NVidia. It seems to be the most well-known and well-documented. Unfortunately, it'll only work on NVidia video cards. I've downloaded the SDK, tried out some of the samples, and there's some awesome stuff that's being done in CUDA. But the fact that it's limited to NVidia cards makes me question its future.

2.) Stream by ATI. ATI's equivalent to CUDA. As you might expect, it will only work on ATI cards.

3.) OpenCL - The Khronos Group has put together this standard but it's still in its infancy stages. I like the idea of OpenCL though. The hope is that it should be supported by most video card manufacturers and should make cross-video card development that much easier.

But what other technologies for non-graphical GPU programming are coming and what shows the most promise? And do you see or would you like to see these technologies being built into some of the mainstream development frameworks like .NET to make it that much easier?

16条回答
老娘就宠你
2楼-- · 2019-03-08 20:44

Its true that GPUs can achieve very hi performance numbers in data level parallelism situations, as lots here mentioned. But as i see it, there is no much use to it in user space now. I cant help feeling that all this GPGPU propaganda comes from GPU manufacturers, which just want to find new markets and uses for their products. And thats absolutelly ok. Have you ever wondered why intel/amd didnt include some mini-x86 cores in addition to standard ones (lets say - model with four x86 cores and 64 mini-x86-cores), just to boost data level paralelism capabilties ? They definately could do that, if wanted. My guess is that industry just dont need that kind of processing power in regular desktop/server machines.

查看更多
Melony?
3楼-- · 2019-03-08 20:45

Monte Carlo is embarrassingly parallel, but it is a core technique in financial and scientific computing.

One of the respondents is slightly incorrect to say that most real world challenges are not decomposable easily into these types of tasks.

Much tractible scientific investigation is done by leveraging what can be expressed in an embarrassingly parallel manner.

Just because it is named "embarrassingly" parallel does not mean it is not an extremely important field.

I've worked in several financial houses, and we forsee that we can throw out farms of 1000+ montecarlo engines (many stacks of blades lined up together) for several large NVidia CUDA installations - massively decreasing power and heat costs in the data centre.

One significant architectural benefit is that there is a lot less network load also, as there are far less machines that need to be fed data and report their results.

Fundamentally however such technologies are at a level of abstraction lower than a managed runtime language such as C#, we are talking about hardware devices that run their own code on their own processors.

Integration should first be done with Matlab, Mathematica I'd expect, along with the C APIs of course...

查看更多
成全新的幸福
4楼-- · 2019-03-08 20:46

I'm very excited about this technology. However, I think that This will only exacerbate the real challenge of large parallel tasks, one of bandwidth. Adding more cores will only increase contention for memory. OpenCL and other GPGPU abstraction libraries don't offer any tools to improve that.

Any high performance computing hardware platform will usually be designed with the bandwidth issue carefully planned into the hardware, balancing throughput, latency, caching and cost. As long as commodity hardware, CPU's and GPU's, are designed in isolation of each other, with optimized bandwidth only to their local memory, it will be very difficult to improve this for the algorithms that need it.

查看更多
再贱就再见
5楼-- · 2019-03-08 20:48

A big problem with the GPU technology is that while you do have a lot of compute capability in there, getting data into (and out of it) is terrible (performance-wise). And watch carefully for any comparison benchmarks... they often compare gcc (with minimal optimization, no vectorization) on a single processor system to the GPU.

Another big problem with the GPU's is that if you don't CAREFULLY think about how your data is organized, you will suffer a real performance hit internally (in the GPU). This often involves rewriting very simple code into a convoluted pile of rubbish.

查看更多
狗以群分
6楼-- · 2019-03-08 20:54

I think you can count the next DirectX as another way to use the GPU.

From my experience, GPUs are extremely fast for algorithms that are easy to parallelize. I recently optimized a special image resizing algorithm in CUDA to be more than 100 times faster on the GPU (not even a high end one) than a quad core Intel processor. The problem was getting the data to the GPU and then fetching the result back to main memory, both directions limited by the memcpy() speed on that machine, which was less than 2 GB/s. As a result, the algorithm was only slightly faster than the CPU version...

So it really depends. If you have a scientific application where you can keep most of the data on the GPU, and all algorithms map to a GPU implementation, then fine. Else I would wait until there's a faster pipe between CPU and GPU, or let's see what ATI has up their sleeves with a combined chip...

About which technology to use: I think once you have your stuff running in CUDA, the additional step to port it to OpenCL (or another language) is not so large. You did all the heavy work by parallelizing your algorithms, and the rest is just a different 'flavor'

查看更多
做个烂人
7楼-- · 2019-03-08 20:56

GHC (Haskell) researchers (working for Microsoft Research) are adding support for Nested Data Parallelism directly to a general purpose programming language. The idea is to use multiple cores and/or GPUs on the back end yet expose data parallel arrays as a native type in the language, regardless of the runtime executing the code in parallel (or serial for the single-CPU fallback).

http://www.haskell.org/haskellwiki/GHC/Data_Parallel_Haskell

Depending on the success of this in the next few years, I would expect to see other languages (C# specifically) pick up on the idea, which could bring these sorts of capabilities to a more mainstream audience. Perhaps by that time the CPU-GPU bandwidth and driver issues will be resolved.

查看更多
登录 后发表回答