I understand how to write OpenGL/DirectX programs, and I know the maths and the conceptual stuff behind it, but I'm curious how the GPU-CPU communication works on a low level.
Say I've got an OpenGL program written in C that displays a triangle and rotates the camera by 45 degrees. When I compile this program, will it be turned into a series of ioctl-calls, and the gpu driver then sends the appropriate commands to the gpu, where all the logic of rotating the triangle and setting the appropriate pixels in the appropriate color is wired in? Or will the program be compiled into a "gpu program" which is loaded onto the gpu and computes the rotation etc.? Or something completely different?
Edit: A few days later I found this article series, which basically answers the question: http://fgiesen.wordpress.com/2011/07/01/a-trip-through-the-graphics-pipeline-2011-part-1/
This question is almost impossible to answer because OpenGL by itself is just a front end API, and as long as an implementations adheres to the specification and the outcome conforms to this it can be done any way you like.
The question may have been: How does an OpenGL driver work on the lowest level. Now this is again impossible to answer in general, as a driver is closely tied to some piece of hardware, which may again do things however the developer designed it.
So the question should have been: "How does it look on average behind the scenes of OpenGL and the graphics system?". Let's look at this from the bottom up:
At the lowest level there's some graphics device. Nowadays these are GPUs which provide a set of registers controlling their operation (which registers exactly is device dependent) have some program memory for shaders, bulk memory for input data (vertices, textures, etc.) and an I/O channel to the rest of the system over which it recieves/sends data and command streams.
The graphics driver keeps track of the GPUs state and all the resources application programs that make use of the GPU. Also it is responsible for conversion or any other processing the data sent by applications (convert textures into the pixelformat supported by the GPU, compile shaders in the machine code of the GPU). Furthermore it provides some abstract, driver dependent interface to application programs.
Then there's the driver dependent OpenGL client library/driver. On Windows this gets loaded by proxy through opengl32.dll, on Unix systems this resides in two places:
On MacOS X this happens to be the "OpenGL Framework".
It is this part that translates OpenGL calls how you do it into calls to the driver specific functions in the part of the driver described in (2).
Finally the actual OpenGL API library, opengl32.dll in Windows, and on Unix /usr/lib/libGL.so; this mostly just passes down the commands to the OpenGL implementation proper.
How the actual communication happens can not be generalized:
In Unix the 3<->4 connection may happen either over Sockets (yes, it may, and does go over network if you want to) or through Shared Memory. In Windows the interface library and the driver client are both loaded into the process address space, so that's no so much communication but simple function calls and variable/pointer passing. In MacOS X this is similar to Windows, only that there's no separation between OpenGL interface and driver client (that's the reason why MacOS X is so slow to keep up with new OpenGL versions, it always requires a full operating system upgrade to deliver the new framework).
Communication betwen 3<->2 may go through ioctl, read/write, or through mapping some memory into process address space and configuring the MMU to trigger some driver code whenever changes to that memory are done. This is quite similar on any operating system since you always have to cross the kernel/userland boundary: Ultimately you go through some syscall.
Communication between system and GPU happen through the periphial bus and the access methods it defines, so PCI, AGP, PCI-E, etc, which work through Port-I/O, Memory Mapped I/O, DMA, IRQs.
Your program is not compiled for any particular GPU; it is just dynamically linked against a library that will implement OpenGL. The actual implementation might involve sending OpenGL commands to the GPU, running software fallbacks, compiling shaders and sending them to the GPU, or even using shader fallbacks to OpenGL commands. The graphics landscape is fairly complicated. Thankfully linking insulates you from most of the drivers' complexity, leaving driver implementers free to use whatever techniques they see fit.
You're not far off. Your program calls the installable client driver (which is not really a driver, it's a userspace shared library). That will use ioctl or a similar mechanism to pass data to the kernel driver.
For the next part, it depends on the hardware. Older video cards had what is called a "fixed-function pipeline". There were dedicated memory spaces in the video card for matrices, and dedicated hardware for texture lookup, blending, etc. The video driver would load the right data and flags for each of these units and then set up DMA to transfer your vertex data (position, color, texture coordinates, etc).
Newer hardware has processor cores ("shaders") inside the video card, which differ from your CPU in that they each run much slower, but there are many more of them working in parallel. For these video cards, the driver prepares program binaries to run on the GPU shaders.
C/C++ compilers/linkers do exactly one thing: they convert text files into a series of machine-specific opcodes that are run on the CPU. OpenGL and Direct3D are just C/C++ APIs; they cannot magically convert your C/C++ compiler/linker into a compiler/linker for the GPU.
Every line of C/C++ code you write will be executed on the CPU. Calls to OpenGL/Direct3D will call into C/C++ libraries, static or dynamic as the case may be.
The only place when a "gpu program" would come into play is if your code explicitly creates shaders. That is, if you make the API calls into OpenGL/D3D that cause the compiling and linking of shaders. To do this, you (at runtime, not C/C++ compile-time) either generate or load strings that represent shaders in some shader language. You then shove them through the shader compiler, and get back an object in that API that represents that shader. You then apply one or more shaders to a particular rendering command. Each of these steps happen explicitly at the direction of your C/C++ code, which as previously stated runs on the CPU.
Many shader languages use C/C++-like syntax. But that doesn't make them equivalent to C/C++.