Depth Component of Converting from Window -> World

I'm working on a program that draws a 100x100 grid and allows the user to click on a cell and change the color.

Clicking also works currently, however only when looking at the grid face on (i.e. camPos.z equal to camLook.z) and when the grid is positioned in the center of the screen.

What I've been stuck on the last few days is selecting the correct cell when looking at the grid from a different camera position or different area on the screen.

My only guess would be that somehow the depth buffer does not reflect the current position of the camera or that there is some inconsistency between the buffer depth range and the near and far values of the camera. Or that the way I'm applying the projection/view matrix is ok for displaying the image, but something is going wrong when going back through the pipeline. But I can't quite figure it out.

(code updated/refactored since originally posting)

Vertex Shader:

#version 330

layout(location = 0) in vec4 position;

smooth out vec4 theColor;

uniform vec4 color;
uniform mat4 pv;

void main() {
  gl_Position = pv * position;
  theColor = color;
}

Camera class (result of projectionViewMatrix() is the pv uniform above):

Camera::Camera()
{
  camPos = glm::vec3(1.0f, 5.0f, 2.0f);
  camLook = glm::vec3(1.0f, 0.0f, 0.0f);

  fovy = 90.0f;
  aspect = 1.0f;
  near = 0.1f;
  far = 1000.0f;
}

glm::mat4 Camera::projectionMatrix()
{
  return glm::perspective(fovy, aspect, near, far);
}

glm::mat4 Camera::viewMatrix()
{
  return glm::lookAt(
    camPos,
    camLook,
    glm::vec3(0.0f, 1.0f, 0.0f)
  );
}

glm::mat4 Camera::projectionViewMatrix()
{
  return projectionMatrix() * viewMatrix();
}

// view controls

void Camera::moveForward()
{
  camPos.z -= 1.0f;
  camLook.z -= 1.0f;
}

void Camera::moveBack()
{
  camPos.z += 1.0f;
  camLook.z += 1.0f;
}

void Camera::moveLeft()
{
  camPos.x -= 1.0f;
  camLook.x -= 1.0f;
}

void Camera::moveRight()
{
  camPos.x += 1.0f;
  camLook.x += 1.0f;
}

void Camera::zoomIn()
{
  camPos.y -= 1.0f;
}

void Camera::zoomOut()
{
  camPos.y += 1.0f;
}

void Camera::lookDown()
{
  camLook.z += 0.1f;
}

void Camera::lookAtAngle()
{
  if (camLook.z != 0.0f)
    camLook.z -= 0.1f;
}

Specific function in the camera class where I am trying to get world coordinates (x and y are screen coordinates):

glm::vec3 Camera::experiment(int x, int y)
{
  GLint viewport[4];
  glGetIntegerv(GL_VIEWPORT, viewport);

  GLfloat winZ;
  glReadPixels(x, y, 1, 1, GL_DEPTH_COMPONENT, GL_FLOAT, &winZ);
  printf("DEPTH: %f\n", winZ);

  glm::vec3 pos = glm::unProject(
    glm::vec3(x, viewport[3] - y, winZ),
    viewMatrix(),
    projectionMatrix(),
    glm::vec4(0.0f, 0.0f, viewport[2], viewport[3])
  );

  printf("POS: (%f, %f, %f)\n", pos.x, pos.y, pos.z);

  return pos;
}

Initialization and display:

void init(void)
{
  glewExperimental = GL_TRUE;
  glewInit();

  glEnable(GL_DEPTH_TEST);
  glDepthMask(GL_TRUE);
  glDepthFunc(GL_LESS);
  glDepthRange(0.0f, 1.0f);

  InitializeProgram();
  InitializeVAO();
  InitializeGrid();

  glEnable(GL_CULL_FACE);
  glCullFace(GL_BACK);
  glFrontFace(GL_CW);
}

void display(void)
{
  glClearColor(0.0f, 0.0f, 0.0f, 0.0f);
  glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

  glUseProgram(theProgram);
  glBindVertexArray(vao);

  glUniformMatrix4fv(projectionViewMatrixUnif, 1, GL_FALSE, glm::value_ptr(camera.projectionViewMatrix()));

  DrawGrid();

  glBindVertexArray(0);
  glUseProgram(0);

  glutSwapBuffers();
  glutPostRedisplay();
}

int main(int argc, char** argv)
{
  glutInit(&argc, argv);

  glutInitDisplayMode(GLUT_RGB | GLUT_DEPTH);
  glutInitContextVersion(3, 2);
  glutInitContextProfile(GLUT_CORE_PROFILE);

  glutInitWindowSize(500, 500);
  glutInitWindowPosition(300, 200);

  glutCreateWindow("testing");

  init();

  glutDisplayFunc(display);
  glutReshapeFunc(reshape);
  glutKeyboardFunc(keyboard);
  glutMouseFunc(mouse);
  glutMainLoop();
  return 0;
}

It is actually very simple to project rays under the cursor to implement picking. It will always work with pretty much any projection and modelview matrix (except for some invalid singular cases which transform the entire scene into infinity, etc.).

I've written a small demo which uses the deprecated fixed-function pipeline for simplicity, but the code will work with shaders as well. It begins by reading the matrices from OpenGL:

glm::mat4 proj, mv;
glGetFloatv(GL_PROJECTION_MATRIX, &proj[0][0]);
glGetFloatv(GL_MODELVIEW_MATRIX, &mv[0][0]);
glm::mat4 mvp = proj * mv;

Here mvp is what you would pass to your vertex shader. Then we define two points:

glm::vec4 nearc(f_mouse_x, f_mouse_y, 0, 1);
glm::vec4 farc(f_mouse_x, f_mouse_y, 1, 1);

These are near and far cursor coordinates in normalized space (so f_mouse_x and f_mouse_y are in the [-1, 1] interval). Note that the z coordinates do not need to be 0 and 1, they just need to be two different arbitrary numbers. Now we can use the mvp to unproject them to worldspace:

nearc = glm::inverse(mvp) * nearc;
nearc /= nearc.w; // dehomog
farc = glm::inverse(mvp) * farc;
farc /= farc.w; // dehomog

Note that the homogenous division is important here. This gives us the position of the cursor in worldspace, where your objects are defined (except when they have their own model matrices, but that is easy to incorporate).

Finally, the demo calculates intersection of the ray between nearc and farc and a plane on which there is a texture (your 100x100 grid):

glm::vec3 plane_normal(0, 0, 1); // plane normal
float plane_d = 0; // plane distance from origin
// this is the plane with the grid

glm::vec3 ray_org(nearc), ray_dir(farc - nearc);
ray_dir = glm::normalize(ray_dir);
// this is the ray under the mouse cursor

float t = glm::dot(ray_dir, plane_normal);
if(fabs(t) > 1e-5f)
    t = -(glm::dot(ray_org, plane_normal) + plane_d) / t;
else
    t = 0; // no intersection, the plane and ray is collinear
glm::vec3 isect = ray_org + t * ray_dir;
// calculate ray-plane intersection

float grid_x = N * (isect.x + 1) / 2;
float grid_y = N * (isect.y + 1) / 2;
if(t && grid_x >= 0 && grid_x < N && grid_y >= 0 && grid_y < N) {
    int x = int(grid_x), y = int(grid_y);
    // calculate integer coordinates

    tex_data[x + N * y] = 0xff0000ff; // red
    glBindTexture(GL_TEXTURE_2D, n_texture);
    glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, N, N, GL_RGBA, GL_UNSIGNED_BYTE, &tex_data[0]);
    // change the texture to see
}
// calculate grid position in pixels

The output is fairly nice:

This is only a 20x20 texture, but it is trivial to go up to 100x100. You can get the full demo source and precompiled win32 binaries here. It counts on having glm. You can turn with mouse or move with WASD.

More complicated objects than planes are possible, it is essentially raytracing. Using the depth component under the cursor (window z) is just as simple - only beware the normalized coordinates ([0, 1] vs. [-1, 1]). Also note, that reading back the z value may deteriorate performance, as it requires CPU / GPU synchronization.