Google ARCore Domain Model by Example

I'm trying to read and make sense of Google ARCore's domain model, particularly the Android SDK packages. Currently this SDK is in "preview" mode and so there are no tutorials, blogs, articles, etc. available on understanding how to use this API. Even Google itself suggests just reading the source code, source code comments and Javadocs to understand how to use the API. Problem is: if you're not already a computer vision expert, the domain model will feel a little alien & unfamiliar to you.

Specifically I'm interested in understanding the fundamental differences between, and proper usages of, the following classes:

Frame
Anchor
Pose
PointCloud

According to Anchor's javadoc:

"Describes a fixed location and orientation in the real world. To stay at a fixed location in physical space, the numerical description of this position will update as ARCore's understanding of the space improves. Use getPose() to get the current numerical location of this anchor. This location may change any time update() is called, but will never spontaneously change."

So Anchors have a Pose. Sounds like you "drop an Anchor" onto something thats visible in the camera, and then ARCore tracks that Anchor and constantly updates its Pose to reflect the nature of its onscreen coordinates maybe?

And from Pose's javadoc:

"Represents an immutable rigid transformation from one coordinate frame to another. As provided from all ARCore APIs, Poses always describe the transformation from object's local coordinate frame to the world coordinate frame (see below)...These changes mean that every frame should be considered to be in a completely unique world coordinate frame."

So it sounds like a Pose is something that is only unique to the "current frame" of the camera and that each time the frame is updated, all poses for all anchors are recalculated maybe? If not, then what's the relationship between an Anchor, its Pose, the current frame and the world coordinate frame? And what's a Pose really, anyways? Is a "Pose" just a way of storing matrix/point data so that you can convert an Anchor from the current frame to the world frame? Or something else?

Finally, I see a strong correlation between Frames, Poses and Anchors, but then there's PointCloud. The only class I can see inside com.google.ar.core that uses these is the Frame. PointClouds appear to be (x,y,z)-coordinates with a 4th property representing ARCore's "confidence" that the x/y/z components are actually correct. So if an Anchor has a Pose, I would have imagined that a Pose would also have a PointCloud representing the Anchor's coordinates & confidence in those coordinates. But Pose does not have a PointCloud, and so I must be completely misunderstanding the concepts that these two classes model.

The question

I've posed several different questions above, but they all boil down to a single, concise, answerable question:

What is the difference in the concepts behind Frame, Anchor, Pose and PointCloud and when do you use each of them (and for what purposes)?

A Pose is a structured transformation. It is a fixed numerical transformation from one coordinate system (typically object local) to another (typically world).

An Anchor represents a physically fixed location in the world. It's getPose() will update as the understanding of the world changes. For example, imagine you have a building with a hallway around the outside. If you walk all the way around that hallway, sensor drift results in you not winding up at the same coordinates you started at. However, ARCore can detect (using visual features) that it is in the same space it started it. When this happens, it distorts the world so that your current location and original location line up. As part of this distortion, the location of anchors will be adjusted as well so that they stay in the same physical place.

Because of this distortion, a Pose relative to the world should be considered valid only for the duration of the frame during which it was returned. As soon as you call update() the next time, the world may have reshaped at that pose could be useless. If you need to keep a location longer than a frame, create an Anchor. Just make sure to removeAnchors() anchors that you're no longer using, as there is ongoing cost for each live anchor.

A Frame captures the current state at an instant and changes between two calls to update().

PointClouds are sets of 3D visual feature points detected in the world. They are in their own local coordinate system, which can be accessed from Frame.getPointCloudPose(). Developers looking to have better spatial understanding than the plane detection provides can try using the point clouds to learn more about the structure of the 3D world.

Does that help?