I am using an AVCaptureSession
to use video and audio input and encode an H.264 video with AVAssetWriter
.
If I don't write the audio, the video is encoded as expected. But if I write the audio, I am getting a corrupt video.
If I inspect the audio CMSampleBuffer
being supplied to the AVAssetWriter
it shows this information:
invalid = NO
dataReady = YES
makeDataReadyCallback = 0x0
makeDataReadyRefcon = 0x0
formatDescription = <CMAudioFormatDescription 0x17410ba30 [0x1b3a70bb8]> {
mediaType:'soun'
mediaSubType:'lpcm'
mediaSpecific: {
ASBD: {
mSampleRate: 44100.000000
mFormatID: 'lpcm'
mFormatFlags: 0xc
mBytesPerPacket: 2
mFramesPerPacket: 1
mBytesPerFrame: 2
mChannelsPerFrame: 1
mBitsPerChannel: 16 }
cookie: {(null)}
ACL: {(null)}
FormatList Array: {(null)}
}
extensions: {(null)}
Since it is supplying lpcm audio, I have configured the AVAssetWriterInput
with this setting for sound (I have tried both one and two channels):
var channelLayout = AudioChannelLayout()
memset(&channelLayout, 0, MemoryLayout<AudioChannelLayout>.size);
channelLayout.mChannelLayoutTag = kAudioChannelLayoutTag_Mono
let audioOutputSettings:[String: Any] = [AVFormatIDKey as String:UInt(kAudioFormatLinearPCM),
AVNumberOfChannelsKey as String:1,
AVSampleRateKey as String:44100.0,
AVLinearPCMIsBigEndianKey as String:false,
AVLinearPCMIsFloatKey as String:false,
AVLinearPCMBitDepthKey as String:16,
AVLinearPCMIsNonInterleaved as String:false,
AVChannelLayoutKey: NSData(bytes:&channelLayout, length:MemoryLayout<AudioChannelLayout>.size)]
self.assetWriterAudioInput = AVAssetWriterInput(mediaType: AVMediaTypeAudio, outputSettings: audioOutputSettings)
self.assetWriter.add(self.assetWriterAudioInput)
When I use the lpcm setting above, I cannot open the video with any application. I have tried using kAudioFormatMPEG4AAC
and kAudioFormatAppleLossless
and I still get a corrupt video but I am able to view the video using QuickTime Player 8 (not QuickTime Player 7), but it is confused about the duration of the video and no sound is played.
When recording is complete I am calling:
func endRecording(_ completionHandler: @escaping () -> ()) {
isRecording = false
assetWriterVideoInput.markAsFinished()
assetWriterAudioInput.markAsFinished()
assetWriter.finishWriting(completionHandler: completionHandler)
}
This is how the AVCaptureSession
is being configured:
func setupCapture() {
captureSession = AVCaptureSession()
if (captureSession == nil) {
fatalError("ERROR: Couldnt create a capture session")
}
captureSession?.beginConfiguration()
captureSession?.sessionPreset = AVCaptureSessionPreset1280x720
let frontDevices = AVCaptureDevice.devices().filter{ ($0 as AnyObject).hasMediaType(AVMediaTypeVideo) && ($0 as AnyObject).position == AVCaptureDevicePosition.front }
if let captureDevice = frontDevices.first as? AVCaptureDevice {
do {
let videoDeviceInput: AVCaptureDeviceInput
do {
videoDeviceInput = try AVCaptureDeviceInput(device: captureDevice)
}
catch {
fatalError("Could not create AVCaptureDeviceInput instance with error: \(error).")
}
guard (captureSession?.canAddInput(videoDeviceInput))! else {
fatalError()
}
captureSession?.addInput(videoDeviceInput)
}
}
do {
let audioDevice = AVCaptureDevice.defaultDevice(withMediaType: AVMediaTypeAudio)
let audioDeviceInput: AVCaptureDeviceInput
do {
audioDeviceInput = try AVCaptureDeviceInput(device: audioDevice)
}
catch {
fatalError("Could not create AVCaptureDeviceInput instance with error: \(error).")
}
guard (captureSession?.canAddInput(audioDeviceInput))! else {
fatalError()
}
captureSession?.addInput(audioDeviceInput)
}
do {
let dataOutput = AVCaptureVideoDataOutput()
dataOutput.videoSettings = [kCVPixelBufferPixelFormatTypeKey as String : kCVPixelFormatType_32BGRA]
dataOutput.alwaysDiscardsLateVideoFrames = true
let queue = DispatchQueue(label: "com.3DTOPO.videosamplequeue")
dataOutput.setSampleBufferDelegate(self, queue: queue)
guard (captureSession?.canAddOutput(dataOutput))! else {
fatalError()
}
captureSession?.addOutput(dataOutput)
videoConnection = dataOutput.connection(withMediaType: AVMediaTypeVideo)
}
do {
let audioDataOutput = AVCaptureAudioDataOutput()
let queue = DispatchQueue(label: "com.3DTOPO.audiosamplequeue")
audioDataOutput.setSampleBufferDelegate(self, queue: queue)
guard (captureSession?.canAddOutput(audioDataOutput))! else {
fatalError()
}
captureSession?.addOutput(audioDataOutput)
audioConnection = audioDataOutput.connection(withMediaType: AVMediaTypeAudio)
}
captureSession?.commitConfiguration()
// this will trigger capture on its own queue
captureSession?.startRunning()
}
The AVCaptureVideoDataOutput
delegate method:
func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {
// func captureOutput(captureOutput: AVCaptureOutput, sampleBuffer: CMSampleBuffer, connection:AVCaptureConnection) {
var error: CVReturn
if (connection == audioConnection) {
delegate?.audioSampleUpdated(sampleBuffer: sampleBuffer)
return
}
// ... Write video buffer ...//
}
Which calls:
func audioSampleUpdated(sampleBuffer: CMSampleBuffer) {
if (isRecording) {
while !assetWriterAudioInput.isReadyForMoreMediaData {}
if (!assetWriterAudioInput.append(sampleBuffer)) {
print("Unable to write to audio input");
}
}
}
If I disable the assetWriterAudioInput.append()
call above, then the video isn't corrupt but of course I have no audio encoded. How can I get both video and audio encoding to work?
I figured it out. I was setting the
assetWriter.startSession
source time to 0, and then subtracting the start time from currentCACurrentMediaTime()
for writing the pixel data.I changed the
assetWriter.startSession
source time to theCACurrentMediaTime()
and don't subtract the current time when writing the video frame.Old start session code:
New code that works: