Your CPU may be a quad-core, but did you know that some graphics cards today have over 200 cores? We've already seen what GPU's in today's graphics cards can do when it comes to graphics. Now they can be used for non-graphical tasks as well, and in my opinion the results are nothing short of amazing. An algorithm that lends itself well to parallelism has the potential to be much, much faster on a GPU than it could ever be on a CPU.
There are a few technologies that make all of this possible:
1.) CUDA by NVidia. It seems to be the most well-known and well-documented. Unfortunately, it'll only work on NVidia video cards. I've downloaded the SDK, tried out some of the samples, and there's some awesome stuff that's being done in CUDA. But the fact that it's limited to NVidia cards makes me question its future.
2.) Stream by ATI. ATI's equivalent to CUDA. As you might expect, it will only work on ATI cards.
3.) OpenCL - The Khronos Group has put together this standard but it's still in its infancy stages. I like the idea of OpenCL though. The hope is that it should be supported by most video card manufacturers and should make cross-video card development that much easier.
But what other technologies for non-graphical GPU programming are coming and what shows the most promise? And do you see or would you like to see these technologies being built into some of the mainstream development frameworks like .NET to make it that much easier?
Pretty much anything that can be paralleled may be able to benefit. More specific examples would be SETI@home, folding@home, and other distributed projects as well as scientific computing.
Especially things that heavily rely on floating point arithmetic. This is because GPUs have specialized circuitry which is VERY fast at floating point operations. This means its not as versatile, but it's VERY good at what it does do.
If you want to look at more dedicated GPU processing, check out Nvidia's Tesla GPU. It's a GPU, but it doesn't actually have a monitor output!
I doubt we will see too much GPU processing on the common desktop, or at least for a while, because not everyone has a CUDA or similar capable graphics card, if they even have a graphics card at all. It's also very difficult to make programs more parallel. Games could possibly utilize this extra power, but it will be very difficult and probably won't be too useful, since all graphics calculations are mostly already on the GPU and the other work is on the CPU and has to be on the CPU due to the instruction sets.
GPU processing, at least for a while, will be for very specific niche markets that need a lot of floating point computation.
GPUs may or may not remain as popular as they are now, but the basic idea is becoming a rather popular approach to high power processing. One trend that is coming up now is the external "accelerator" to aid the CPU with large floating point jobs. A GPU is just one type of accelerator.
Intel is releasing a new accelerator called the Xeon Phi, which they're hoping can challenge the GPU as a HPC accelerator. The Cell processor took a similar approach, having one main CPU for doing general tasks, and offloading compute intensive tasks to some other processing elements, achieving some impressive speeds.
Accelerators in general seem to be of interest at the moment, so they should be around for a while at least. Whether or not the GPU remains as the de facto accelerator remains to be seen.
GPUs work well in problems where there is a high level of Data Level Parallelism, which essentially means there is a way to partition the data to be processed such that they can all be processed.
GPUs aren't inherently as fast at a clock speed level. In fact I'm relatively sure the clock speed on the shaders (or maybe they have a more GPGPU term for them these days?) is quite slow compared to the ALUs on a modern desktop processor. The thing is, a GPU has an absolutely enormous amount of these shaders, turning the GPU into an a very large SIMD processor. With the amount of shaders on a modern Geforce, for example, it's possible for a GPU to be working on several hundred (thousand?) floating point numbers at once.
So short, a GPU can be amazingly fast for problems where you can partition the data properly and process the partitions independently. It's not so powerful at Task (thread) Level Parallelism.
I'll repeat the answer I gave here.
Long-term I think that the GPU will cease to exist, as general purpose processors evolve to take over those functions. Intel's Larrabee is the first step. History has shown that betting against x86 is a bad idea.