I am trying to create a sample application where I can overlay 3d objects on a camera screen. They will be placed at a particular point, and re-drawn every frame as the user moves the camera to shift perspective.
In essence, I'm looking to replicate this: http://www.youtube.com/watch?v=EEstFtQbzow
Here's my attempt at phrasing the problem more precisely: consider being given an initial image matrix (representing all the X,Y pixel coords) at the time of initial object placement. Once its placed, every subsequent video frame will need to be analyzed to re-position the object that's been placed so that it can be re-drawn (overlayed) correctly given the new perspective.
I have a bit of a background in computer vision, but I am unsure how to do this particular task. For reference, the sample application I'm looking to create will be for Android, so if anyone knows of existing code I could leverage that would be great as well. However I'm very open to being directed to academic papers describing algorithms I need to implement.
Thanks.
This is a pretty well known problem in computer vision. There are various papers you can refer to for this, including systems that do simultaneous localisation and mapping (SLAM), which may use either bundle adjustment or filter-based tracking. Reading up popular papers on these topics will give you a lot of insight into cameras and tracking in the real world.
To summarise, you will need to obtain the 6D pose of the camera in every frame i.e. you need to figure out where the camera is in the real world (translation), and where it is pointing (rotation). This is usually done by first tracking salient features in the scene, estimating their 3D positions and then using the perceived motion of these features to figure out the camera pose in every frame. You will need to define an origin (you cannot use the camera as the origin for the problem you're trying to solve) in the real world and have at least 4 known/measured points as a reference to start with. In the video you've included in your question, Augment seem to use a printed pattern to get the initial camera pose. They then track features in the real world to continue tracking the pose.
Once you have the camera pose, you can position the 3D object in the real world using projections. The camera pose is encoded the essential/fundamental camera matrix, using which you can transform any 3D point in the world to a 2D position in the camera's frame. So to render a virtual 3D point in the real world, say at (x, y, z), you will project (x, y, z) to 2D point (u, v) using the camera matrix. Then render the point on the image obtained from the camera. Do this for every point of the object you want to render, and you're done :)
You should have a look at Vuforia, a mobile SDK developed by Qualcomm. It's free, and offers a lot of tools to add augmented reality to your applications.
As far as I know, it's what the guys from Augment (in your video) use in their app too!
It's a classic problem. In the Movie Visual Effects (VFX) industry it's called matchmoving. It boils down to solving the Structure from Motion (SfM) problem for the given image sequence, and specifically estimating the camera intrinsic parameters and position/pose at every frame with respect to an arbitrary origin point (for example, the position/pose at the first frame of the sequence).
Relevant search terms: "sfm", "matchmoving", "bundle adjustment", "ceres solver".
The Ceres bundle adjuster from Google's has been opensourced and includes an Android port (which powers the "spherical" camera mode in recent releases).
Here is a step by step tutorial on how to build the furniture use case using the Metaio SDK (also free with a small watermark- but it also has it's own rendering engine so you can use it with or without unity- native code) http://dev.metaio.com/sdk/tutorials/interactive-furniture/
The unique thing about this is that while the example in the video you show is limited to using a marker, Metaio's sdk allows you to use SLAM environmental tracking, image/ marker tracking, or GPS coordinate tracking in order to augment the 3D objects- aka you can do the same with or without a marker.
All the other information about tracking configurations + tutorials + live webinars + sample code can all be found through the link above.
Hope this helps.