I'm trying to read and make sense of Google ARCore's domain model, particularly the Android SDK packages. Currently this SDK is in "preview" mode and so there are no tutorials, blogs, articles, etc. available on understanding how to use this API. Even Google itself suggests just reading the source code, source code comments and Javadocs to understand how to use the API. Problem is: if you're not already a computer vision expert, the domain model will feel a little alien & unfamiliar to you.
Specifically I'm interested in understanding the fundamental differences between, and proper usages of, the following classes:
Frame
Anchor
Pose
PointCloud
According to Anchor
's javadoc:
"Describes a fixed location and orientation in the real world. To stay at a fixed location in physical space, the numerical description of this position will update as ARCore's understanding of the space improves. Use getPose() to get the current numerical location of this anchor. This location may change any time update() is called, but will never spontaneously change."
So Anchors have a Pose. Sounds like you "drop an Anchor" onto something thats visible in the camera, and then ARCore tracks that Anchor and constantly updates its Pose
to reflect the nature of its onscreen coordinates maybe?
And from Pose
's javadoc:
"Represents an immutable rigid transformation from one coordinate frame to another. As provided from all ARCore APIs, Poses always describe the transformation from object's local coordinate frame to the world coordinate frame (see below)...These changes mean that every frame should be considered to be in a completely unique world coordinate frame."
So it sounds like a Pose
is something that is only unique to the "current frame" of the camera and that each time the frame is updated, all poses for all anchors are recalculated maybe? If not, then what's the relationship between an Anchor, its Pose, the current frame and the world coordinate frame? And what's a Pose really, anyways? Is a "Pose" just a way of storing matrix/point data so that you can convert an Anchor from the current frame to the world frame? Or something else?
Finally, I see a strong correlation between Frames, Poses and Anchors, but then there's PointCloud
. The only class I can see inside com.google.ar.core
that uses these is the Frame
. PointClouds
appear to be (x,y,z)-coordinates with a 4th property representing ARCore's "confidence" that the x/y/z components are actually correct. So if an Anchor has a Pose, I would have imagined that a Pose would also have a PointCloud representing the Anchor's coordinates & confidence in those coordinates. But Pose does not have a PointCloud, and so I must be completely misunderstanding the concepts that these two classes model.
The question
I've posed several different questions above, but they all boil down to a single, concise, answerable question:
What is the difference in the concepts behind Frame, Anchor, Pose and PointCloud and when do you use each of them (and for what purposes)?