What future does the GPU have in computing? [close

2019-03-08 20:22发布

Your CPU may be a quad-core, but did you know that some graphics cards today have over 200 cores? We've already seen what GPU's in today's graphics cards can do when it comes to graphics. Now they can be used for non-graphical tasks as well, and in my opinion the results are nothing short of amazing. An algorithm that lends itself well to parallelism has the potential to be much, much faster on a GPU than it could ever be on a CPU.

There are a few technologies that make all of this possible:

1.) CUDA by NVidia. It seems to be the most well-known and well-documented. Unfortunately, it'll only work on NVidia video cards. I've downloaded the SDK, tried out some of the samples, and there's some awesome stuff that's being done in CUDA. But the fact that it's limited to NVidia cards makes me question its future.

2.) Stream by ATI. ATI's equivalent to CUDA. As you might expect, it will only work on ATI cards.

3.) OpenCL - The Khronos Group has put together this standard but it's still in its infancy stages. I like the idea of OpenCL though. The hope is that it should be supported by most video card manufacturers and should make cross-video card development that much easier.

But what other technologies for non-graphical GPU programming are coming and what shows the most promise? And do you see or would you like to see these technologies being built into some of the mainstream development frameworks like .NET to make it that much easier?

16条回答
Deceive 欺骗
2楼-- · 2019-03-08 20:34

I expect the same things that CPUs are used for?

I just mean this seems like a gimmick to me. I hesitate to say "that's going nowhere" when it comes to technology but GPUs primary function is graphics rendering and CPUs primary function is all other processing. Having the GPU do anything else just seems whacky.

查看更多
对你真心纯属浪费
3楼-- · 2019-03-08 20:37

It's important to keep in mind that even tasks that are inherently serial can benefit from parallelization if they must be performed many times independently.

Also, bear in mind that whenever anyone reports the speedup of a GPU implementation to a CPU implementation, it is almost never a fair comparison. To be truly fair, the implementers must first spend the time to create a truly optimized, parallel CPU implementation. A single Intel Core i7 965 XE CPU can achieve around 70 gigaflops in double precision today. Current high-end GPUs can do 70-80 gigaflops in double precision and around 1000 in single precision. Thus a speedup of more than 15 may imply an inefficient CPU implementation.

One important caveat with GPU computing is that it is currently "small scale". With a supercomputing facility, you can run a parallelized algorithm on hundreds or even thousands of CPU cores. In contrast, GPU "clusters" are currently limited to about 8 GPUs connected to one machine. Of course, several of these machines can be combined together, but this adds additional complexity as the data must not only pass between computers but also between GPUs. Also, there isn't yet an MPI equivalent that lets processes transparently scale to multiple GPUs across multiple machines; it must be manually implemented (possibly in combination with MPI).

Aside from this problem of scale, the other major limitation of GPUs for parallel computing is the severe restriction on memory access patterns. Random memory access is possible, but carefully planned memory access will result in many-fold better performance.

Perhaps the most promising upcoming contender is Intel's Larrabee. It has considerably better access to the CPU, system memory, and, perhaps most importantly, caching. This should give it considerable advantages with many algorithms. If it can't match the massive memory bandwidth on current GPUs, though, it may be lag behind the competition for algorithms that optimally use this bandwidth.

The current generation of hardware and software requires a lot of developer effort to get optimal performance. This often includes restructuring algorithms to make efficient use of the GPU memory. It also often involves experimenting with different approaches to find the best one.

Note also that the effort required to get optimal performance is necessary to justify the use of GPU hardware. The difference between a naive implementation and an optimized implementation can be an order of magnitude or more. This means that an optimized CPU impelemntation will likely be as good or even better than a naive GPU implementation.

People are already working on .NET bindings for CUDA. See here. However, with the necessity of working at a low level, I don't think GPU computing is ready for the masses yet.

查看更多
地球回转人心会变
4楼-- · 2019-03-08 20:40

I have heard a great deal of talk about turning what today are GPU's into more general-purpose "array proceesor units", for use with any matrix math problem, rather than just graphics processing. I haven't seen much come of it yet though.

The theory was that array processors might follow roughly the same trajectory that float-point processors followed a couple of decades before. Originally floating point processors were expensive add-on options for PC's that not a lot of people bothered to buy. Eventually they became so vital that they were put into the CPU itself.

查看更多
趁早两清
5楼-- · 2019-03-08 20:40

Your perception that GPUs are faster than CPUs is based on the misconception created by a few embarassingly parallel applications applied to the likes of the PS3, NVIDIA and ATI hardware.

http://en.wikipedia.org/wiki/Embarrassingly_parallel

Most real world challenges are not decomposable easily into these types of tasks. The desktop CPU is way better suited for this type of challenge from both a feature set and performance standpoint.

查看更多
狗以群分
6楼-- · 2019-03-08 20:41

I foresee that this technology will become popular and mainstream, but it will take some time to do so. My guess is of about 5 to 10 years.

As you correctly noted, one major obstacle for the adoption of the technology is the lack of a common library that runs on most adapters - both ATI and nVidia. Until this is solved to an acceptable degree, the technology will not enter mainstream and will stay in the niche of custom made applications that run on specific hardware.

As for integrating it with C# and other high-level managed languages - this will take a bit longer, but XNA already demonstrates that custom shaders and managed environment can mix together - to a certain degree. Of course, the shader code is still not in C#, and there are several major obstacles to doing so.

One of the main reasons for fast execution of GPU code is that it has severe limitations on what the code can and cannot do, and it uses VRAM instead of usual RAM. This makes it difficult to bring together CPU code and GPU code. While workarounds are possible, they would practically negate the performance gain.

One possible solution that I see is to make a sub-language for C# that has its limitations, is compiled to GPU code, and has a strictly defined way of communicating with the ususal C# code. However, this would not be much different than what we have already - just more comfortable to write because of some syntactic sugar and standard library functions. Still, this too is ages away for now.

查看更多
▲ chillily
7楼-- · 2019-03-08 20:43

Another technology that's coming for GPU-based processing is GPU versions of existing high-level computational libraries. Not very flashy, I know, but it has significant advantages for portable code and ease of programming.

For example, AMD's Stream 2.0 SDK includes a version of their BLAS (linear algebra) library with some of the computations implemented on the GPU. The API is exactly the same as their CPU-only version of the library that they've shipped for years and years; all that's needed is relinking the application, and it uses the GPU and runs faster.

Similarly, Dan Campbell at GTRI has been working on a CUDA implementation of the VSIPL standard for signal processing. (In particular, the sort of signal and image processing that's common in radar systems and related things like medical imaging.) Again, that's a standard interface, and applications that have been written for VSIPL implementations on other processors can simply be recompiled with this one and use the GPU's capability where appropriate.

In practice, these days already quite a lot of high-performance numerical programs do not do their own low-level programming, but rely on libraries. On Intel hardware, if you're doing number-crunching, it's generally hard to beat the Intel math libraries (MKL) for most things that it implements -- and using them means that you can get the advantages of all of the vector instructions and clever tricks in newer x86 processors, without having to specialize your code for them. With things like GPUs, I suspect this will become even more prevalent.

So I think a technology to watch is the development of general-purpose libraries that form core building blocks for applications in specific domains, in ways that capture parts of those algorithms that can be efficiently sent off to the GPU while minimizing the amount of nonportable GPU-specific cleverness required from the programmer.

(Bias disclaimer: My company has also been working on a CUDA port of our VSIPL++ library, so I'm inclined to think this is a good idea!)

Also, in an entirely different direction, you might want to check out some of the things that RapidMind is doing. Their platform was initially intended for multicore CPU-type systems, but they've been doing a good bit of work extending it to GPU computations as well.

查看更多
登录 后发表回答