I want to augment a virtual object at x,y,z meters wrt camera. OpenCV has camera calibration functions but I don't understand how exactly I can give coordinates in meters
I tried simulating a Camera in Unity but don't get expected result.
I set the projection matrix as follows and create a unit cube at z = 2.415 + 0.5 . Where 2.415 is the distance between the eye and the projection plane (Pinhole camera model) Since the cube's face is at the front clipping plane and it's dimension are unit shouldn't it cover the whole viewport?
Matrix4x4 m = new Matrix4x4();
m[0, 0] = 1;
m[0, 1] = 0;
m[0, 2] = 0;
m[0, 3] = 0;
m[1, 0] = 0;
m[1, 1] = 1;
m[1, 2] = 0;
m[1, 3] = 0;
m[2, 0] = 0;
m[2, 1] = 0;
m[2, 2] = -0.01f;
m[2, 3] = 0;
m[3, 0] = 0;
m[3, 1] = 0;
m[3, 2] = -2.415f;
m[3, 3] = 0;
The global scale of your calibration (i.e. the units of measure of 3D space coordinates) is determined by the geometry of the calibration object you use. For example, when you calibrate in OpenCV using images of a flat checkerboard, the inputs to the calibration procedure are corresponding pairs (P, p) of 3D points P and their images p, the (X, Y, Z) coordinates of the 3D points are expressed in mm, cm, inches, miles, whatever, as required by the size of target you use (and the optics that images it), and the 2D coordinates of the images are in pixels. The output of the calibration routine is the set of parameters (the components of the projection matrix P and the non-linear distortion parameters k) that "convert" 3D coordinates expressed in those metrical units into pixels.
If you don't know (or don't want to use) the actual dimensions of the calibration target, you can just fudge them but leave their ratios unchanged (so that, for example, a square remains a square even though the true length of its side may be unknown). In this case your calibration will be determined up to an unknown global scale. This is actually the common case: in most virtual reality applications you don't really care what the global scale is, as long as the results look correct in the image.
For example, if you want to add an even puffier pair of 3D lips on a video of Angelina Jolie, and composite them with the original video so that the brand new fake lips stay attached and look "natural" on her face, you just need to rescale the 3D model of the fake lips so that it overlaps correctly the image of the lips. Whether the model is 1 yard or one mile away from the CG camera in which you render the composite is completely irrelevant.
For finding augmenting an object you need to find camera pose and orientation. That is the same as finding the camera extrinsics. You also have to calculate first the camera intrinsics (which is called calibraiton).
OpenCV allows you to do all of this, but is not trivial, it requires work on your own. I give you a clue, you first need to recognize something in the scene that you know how it looks, so you can calculate the camera pose by analyzing this object, call it a marker. You can start by the tipical fiducials, they are easy to detect.
Have a look at this thread.
I ended up measuring the field of view manually. Once you know FOV you can easily create the projection matrix. No need to worry about units because in the end the projection is of the form ( X*d/Z, Y*d/Z ). Whatever the units of X,Y,Z may be the ratio of X/Z remains the same.