iOS - Creating multiple time-delayed live camera p

I have done a ton of research, and haven't yet been able to find a viable solution, for many reasons, which I will outline below.

Problem

In my iOS app, I want three views that indefinitely show a delayed-live preview of the device's camera.

For example, view 1 will show a camera view, delayed for 5s, view 2 will show the same camera view, delayed for 20s, and view 3 will show the same camera view, delayed 30s.

This would be used to record yourself performing some kind of activity, such as a workout exercise, and then watch yourself a few seconds later in order to perfect your form of a given exercise.

Solutions Tried

I have tried and researched a couple different solutions, but all have problems.

1. Using `AVFoundation` and `AVCaptureMovieFileOutput`:

Use AVCaptureSession and AVCaptureMovieFileOutput to record short clips to device storage. Short clips are required because you cannot play video from a URL, and write to that same URL simultaneously.
Have 3 AVPlayer and AVPlayerLayer instances, all playing the short recorded clips at their desired time-delays.
Problems:
1. When switching clips using AVPlayer.replaceCurrentItem(_:), there is a very noticeable delay between clips. This needs to be a smooth transition.
2. Although old, a comment here suggests not to create multiple AVPlayer instances due to a device limit. I haven't been able to find information confirming or denying this statement. E: From Jake G's comment - 10 AVPlayer instances is okay for an iPhone 5 and newer.

2. Using `AVFoundation` and `AVCaptureVideoDataOutput`:

Use AVCaptureSession and AVCaptureVideoDataOutput to stream and process each frame of the camera's feed using the didOutputSampleBuffer delegate method.
Draw each frame on an OpenGL view (such as GLKViewWithBounds). This solves the problem of multiple AVPlayer instances from Solution 1..
Problem: Storing each frame so they can be displayed later requires copious amounts of memory (which just isn't viable on an iOS device), or disk space. If I want to store a 2 minute video at 30 frames per second, that's 3600 frames, totalling over 12GB if copied directly from didOutputSampleBuffer. Maybe there is a way to compress each frame x1000 without losing quality that would allow me to keep this data in memory. If such a method exists, I haven't been able to find it.

Possible 3rd Solution

If there is a way to read and write to a file simultaneously, I believe the following solution would be ideal.

Record video as a circular stream. For example, for a video buffer of 2 minutes, I would create a file output stream that will write frames for two minutes. Once the 2 minute mark is hit, the stream will restart from the beginning, overriding the original frames.
With this file output stream constantly running, I would have 3 input streams on the same recorded video file. Each stream would point to a different frame in the stream (effectively X seconds behind the writing stream). Then each frame would be displayed on the input streams respective UIView.

Of course, this still has an issue of storage space. Event if frames were stored as compressed JPEG images, we're talking about multiple GBs of storage required for a lower quality, 2 minute video.

Question

Does anyone know of an efficient method to achieve what I want?
How can I fix some of the problems in the solutions I've already tried?

on iOS AVCaptureMovieFileOutput drops frames when switching files. On osx this doesn't happen. There's a discussion around this in the header file, see captureOutputShouldProvideSampleAccurateRecordingStart.

A combination of your 2. and 3. should work. You need to write the video file in chunks using AVCaptureVideoDataOutput and AVAssetWriter instead of AVCaptureMovieFileOutput so you don't drop frames. Add 3 ring buffers with enough storage to keep up with playback, use GLES or metal to display your buffers (use YUV instead of RGBA use 4/1.5 times less memory).

I tried a more modest version of this back in the days of the mighty iPhone 4s and iPad 2. It showed (I think) now and 10s in the past. I guestimated that because you could encode 30fps at 3x realtime, that I should be able to encode the chunks and read the previous ones using only 2/3 of the hardware capacity. Sadly, either my idea was wrong or there was a non-linearity with the hardware, or the code was wrong and the encoder kept falling behind.