I am trying to process audio data in real-time so that I can display an on-screen spectrum analyzer/visualization based on sound input from the microphone. I am using AVFoundation's AVCaptureAudioDataOutputSampleBufferDelegate
to capture the audio data, which is triggering the delgate function captureOutput
. Function below:
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
autoreleasepool {
guard captureOutput != nil,
sampleBuffer != nil,
connection != nil,
CMSampleBufferDataIsReady(sampleBuffer) else { return }
//Check this is AUDIO (and not VIDEO) being received
if (connection.audioChannels.count > 0)
{
//Determine number of frames in buffer
var numFrames = CMSampleBufferGetNumSamples(sampleBuffer)
//Get AudioBufferList
var audioBufferList = AudioBufferList(mNumberBuffers: 1, mBuffers: AudioBuffer(mNumberChannels: 0, mDataByteSize: 0, mData: nil))
var blockBuffer: CMBlockBuffer?
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(sampleBuffer, nil, &audioBufferList, MemoryLayout<AudioBufferList>.size, nil, nil, UInt32(kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment), &blockBuffer)
let audioBuffers = UnsafeBufferPointer<AudioBuffer>(start: &audioBufferList.mBuffers, count: Int(audioBufferList.mNumberBuffers))
for audioBuffer in audioBuffers {
let data = Data(bytes: audioBuffer.mData!, count: Int(audioBuffer.mDataByteSize))
let i16array = data.withUnsafeBytes {
UnsafeBufferPointer<Int16>(start: $0, count: data.count/2).map(Int16.init(bigEndian:))
}
for dataItem in i16array
{
print(dataItem)
}
}
}
}
}
The code above prints positive and negative numbers of type Int16
as expected, but need help in converting these raw numbers into meaningful data such as power and decibels for my visualizer.
I was on the right track... Thanks to RobertHarvey's comment on my question - Use of the Accelerate Framework's FFT calculation functions is required to achieve a spectrum analyzer. But even before I could use these functions, you need to convert your raw data into an
Array
of typeFloat
as many of the functions require aFloat
array.Firstly, we load the raw data into a
Data
object:I like to think of a
Data
object as a "list" of 1-byte sized chunks of info (8 bits each), but if I check the number of frames I have in my sample and the total size of myData
object in bytes, they don't match:The total size (in bytes) of my data is twice the number of frames I have in my
CMSampleBuffer
. This means that each frame of audio is 2 bytes in length. In order to read the data meaningfully, I need to convert myData
object which is a "list" of 1-byte chunks into an array of 2-byte chunks.Int16
contains 16 bits (or 2 bytes - exactly what we need), so lets create anArray
ofInt16
:Now that we have an
Array
ofInt16
, we can convert it to anArray
ofFloat
:Now that we have our
Float
array, we can now use the Accelerate Framework's complex math to convert the rawFloat
values into meaningful ones like magnitude, decibels etc. Link to documentation:Apple's Accelerate Framework
Fast Fourier Transform (FFT)
I found Apple's documentation rather overwhelming. Luckily, I found a really good example online which I was able to re-purpose for my needs, called TempiFFT. Implementation as follows:
Other useful references:
Converting Data into Array of Int16
Converting Array of Int16 to Array of Float