How to make a 3D model from AVDepthData?

I’m interested in the issue of data processing from TrueDepth Camera. It is necessary to obtain the data of a person’s face, build a 3D model of the face and save this model in an .obj file.

Since in the 3D model needed presence of the person’s eyes and teeth, then ARKit / SceneKit is not suitable, because ARKit / SceneKit do not fill these areas with data.

But with the help of the SceneKit.ModelIO library, I managed to export ARSCNView.scene (type SCNScene) in the .obj format. I tried to take this project as a basis: https://developer.apple.com/documentation/avfoundation/cameras_and_media_capture/streaming_depth_data_from_the_truedepth_camera

In this project, working with TrueDepth Camera is done using Metal, but if I'm not mistaken, MTKView, rendered using Metal, is not a 3D model and cannot be exported as .obj.

Please tell me if there is a way to export MTKView to SCNScene or directly to .obj? If there is no such method, then how to make a 3D model from AVDepthData?

Thanks.

It's possible to make a 3D model from AVDepthData, but that probably isn't what you want. One depth buffer is just that — a 2D array of pixel distance-from-camera values. So the only "model" you're getting from that isn't very 3D; it's just a height map. That means you can't look at it from the side and see contours that you couldn't have seen from the front. (The "Using Depth Data" sample code attached to the WWDC 2017 talk on depth photography shows an example of this.)

If you want more of a truly-3D "model", akin to what ARKit offers, you need to be doing the work that ARKit does — using multiple color and depth frames over time, along with a machine learning system trained to understand human faces (and hardware optimized for running that system quickly). You might not find doing that yourself to be a viable option...

It is possible to get an exportable model out of ARKit using Model I/O. The outline of the code you'd need goes something like this:

Get ARFaceGeometry from a face tracking session.
Create MDLMeshBuffers from the face geometry's vertices, textureCoordinates, and triangleIndices arrays. (Apple notes the texture coordinate and triangle index arrays never change, so you only need to create those once — vertices you have to update every time you get a new frame.)
Create a MDLSubmesh from the index buffer, and a MDLMesh from the submesh plus vertex and texture coordinate buffers. (Optionally, use MDLMesh functions to generate a vertex normals buffer after creating the mesh.)
Create an empty MDLAsset and add the mesh to it.
Export the MDLAsset to a URL (providing a URL with the .obj file extension so that it infers the format you want to export).

That sequence doesn't require SceneKit (or Metal, or any ability to display the mesh) at all, which might prove useful depending on your need. If you do want to involve SceneKit and Metal you can probably skip a few steps:

Create ARSCNFaceGeometry on your Metal device and pass it an ARFaceGeometry from a face tracking session.
Use MDLMesh(scnGeometry:) to get a Model I/O representation of that geometry, then follow steps 4-5 above to export it to an .obj file.

Any way you slice it, though... if it's a strong requirement to model eyes and teeth, none of the Apple-provided options will help you because none of them do that. So, some food for thought:

Consider whether that's a strong requirement?
Replicate all of Apple's work to do your own face-model inference from color + depth image sequences?
Cheat on eye modeling using spheres centered according to the leftEyeTransform/rightEyeTransform reported by ARKit?
Cheat on teeth modeling using a pre-made model of teeth, composed with the ARKit-provided face geometry for display? (Articulate your inner-jaw model with a single open-shut joint and use ARKit's blendShapes[.jawOpen] to animate it alongside the face.)