Screen-to-World coordinate conversion in OpenGLES

2020-07-13 06:44发布

The Screen-to-world problem on the iPhone

I have a 3D model (CUBE) rendered in an EAGLView and I want to be able to detect when I am touching the center of a given face (From any orientation angle) of the cube. Sounds pretty easy but it is not...

The problem:
How do I accurately relate screen-coordinates (touch point) to world-coordinates (a location in OpenGL 3D space)? Sure, converting a given point into a 'percentage' of the screen/world-axis might seem the logical fix, but problems would arise when I need to zoom or rotate the 3D space. Note: rotating & zooming in and out of the 3D space will change the relationship of the 2D screen coords with the 3D world coords...Also, you'd have to allow for 'distance' in between the viewpoint and objects in 3D space. At first, this might seem like an 'easy task', but that changes when you actually examine the requirements. And I've found no examples of people doing this on the iPhone. How is this normally done?

An 'easy' task?:
Sure, one might undertake the task of writing an API to act as a go-between between screen and world, but the task of creating such a framework would require some serious design and would likely take 'time' to do -- NOT something that can be one-manned in 4 hours...And 4 hours happens to be my deadline.

The question:

  • What are some of the simplest ways to know if I touched specific locations in 3D space in the iPhone OpenGL ES world?

5条回答
趁早两清
2楼-- · 2020-07-13 07:17

You need to have the opengl projection and modelview matrices. Multiply them to gain the modelview projection matrix. Invert this matrix to get a matrix that transforms clip space coordinates into world coordinates. Transform your touch point so it corresponds to clip coordinates: the center of the screen should be zero, while the edges should be +1/-1 for X and Y respectively.

construct two points, one at (0,0,0) and one at (touch_x,touch_y,-1) and transform both by the inverse modelview projection matrix.

Do the inverse of a perspective divide.

You should get two points describing a line from the center of the camera into "the far distance" (the farplane).

Do picking based on simplified bounding boxes of your models. You should be able to find ray/box intersection algorithms aplenty on the web.

Another solution is to paint each of the models in a slightly different color into an offscreen buffer and reading the color at the touch point from there, telling you which brich was touched.

Here's source for a cursor I wrote for a little project using bullet physics:

float x=((float)mpos.x/screensize.x)*2.0f -1.0f;
    float y=((float)mpos.y/screensize.y)*-2.0f +1.0f;
    p2=renderer->camera.unProject(vec4(x,y,1.0f,1));
    p2/=p2.w;
    vec4 pos=activecam.GetView().col_t;
    p1=pos+(((vec3)p2 - (vec3)pos) / 2048.0f * 0.1f);
    p1.w=1.0f;

    btCollisionWorld::ClosestRayResultCallback rayCallback(btVector3(p1.x,p1.y,p1.z),btVector3(p2.x,p2.y,p2.z));
    game.dynamicsWorld->rayTest(btVector3(p1.x,p1.y,p1.z),btVector3(p2.x,p2.y,p2.z), rayCallback);
    if (rayCallback.hasHit())
    {
        btRigidBody* body = btRigidBody::upcast(rayCallback.m_collisionObject);
        if(body==game.worldBody)
        {
            renderer->setHighlight(0);
        }
        else if (body)
        {
            Entity* ent=(Entity*)body->getUserPointer();

            if(ent)
            {
                renderer->setHighlight(dynamic_cast<ModelEntity*>(ent));
                //cerr<<"hit ";
                //cerr<<ent->getName()<<endl;
            }
        }
    }
查看更多
男人必须洒脱
3楼-- · 2020-07-13 07:25

Two solutions present themselves. Both of them should achieve the end goal, albeit by a different means: rather than answering "what world coordinate is under the mouse?", they answer the question "what object is rendered under the mouse?".

One is to draw a simplified version of your model to an off-screen buffer, rendering the center of each face using a distinct color (and adjusting the lighting so color is preserved identically). You can then detect those colors in the buffer (e.g. pixmap), and map mouse locations to them.

The other is to use OpenGL picking. There's a decent-looking tutorial here. The basic idea is to put OpenGL in select mode, restrict the viewport to a small (perhaps 3x3 or 5x5) window around the point of interest, and then render the scene (or a simplified version of it) using OpenGL "names" (integer identifiers) to identify the components making up each face. At the end of this process, OpenGL can give you a list of the names that were rendered in the selection viewport. Mapping these identifiers back to original objects will let you determine what object is under the mouse cursor.

查看更多
贼婆χ
4楼-- · 2020-07-13 07:28

You can now find gluUnProject in http://code.google.com/p/iphone-glu/. I've no association with the iphone-glu project and haven't tried it yet myself, just wanted to share the link.

How would you use such a function? This PDF mentions that:

The Utility Library routine gluUnProject() performs this reversal of the transformations. Given the three-dimensional window coordinates for a location and all the transformations that affected them, gluUnProject() returns the world coordinates from where it originated.

int gluUnProject(GLdouble winx, GLdouble winy, GLdouble winz, 
const GLdouble modelMatrix[16], const GLdouble projMatrix[16], 
const GLint viewport[4], GLdouble *objx, GLdouble *objy, GLdouble *objz);

Map the specified window coordinates (winx, winy, winz) into object coordinates, using transformations defined by a modelview matrix (modelMatrix), projection matrix (projMatrix), and viewport (viewport). The resulting object coordinates are returned in objx, objy, and objz. The function returns GL_TRUE, indicating success, or GL_FALSE, indicating failure (such as an noninvertible matrix). This operation does not attempt to clip the coordinates to the viewport or eliminate depth values that fall outside of glDepthRange().

There are inherent difficulties in trying to reverse the transformation process. A two-dimensional screen location could have originated from anywhere on an entire line in three-dimensional space. To disambiguate the result, gluUnProject() requires that a window depth coordinate (winz) be provided and that winz be specified in terms of glDepthRange(). For the default values of glDepthRange(), winz at 0.0 will request the world coordinates of the transformed point at the near clipping plane, while winz at 1.0 will request the point at the far clipping plane.

Example 3-8 (again, see the PDF) demonstrates gluUnProject() by reading the mouse position and determining the three-dimensional points at the near and far clipping planes from which it was transformed. The computed world coordinates are printed to standard output, but the rendered window itself is just black.

In terms of performance, I found this quickly via Google as an example of what you might not want to do using gluUnProject, with a link to what might lead to a better alternative. I have absolutely no idea how applicable it is to the iPhone, as I'm still a newb with OpenGL ES. Ask me again in a month. ;-)

查看更多
仙女界的扛把子
5楼-- · 2020-07-13 07:31

Imagine a line that extends from the viewer's eye
through the screen touch point into your 3D model space.

If that line intersects any of the cube's faces, then the user has touched the cube.

一纸荒年 Trace。
6楼-- · 2020-07-13 07:34

Google for opengl screen to world (for example there’s a thread where somebody wants to do exactly what you are looking for on GameDev.net). There is a gluUnProject function that does precisely this, but it’s not available on iPhone, so that you have to port it (see this source from the Mesa project). Or maybe there’s already some publicly available source somewhere?

登录 后发表回答