I am trying to process audio data in real-time so that I can display an on-screen spectrum analyzer/visualization based on sound input from the microphone. I am using AVFoundation's AVCaptureAudioDataOutputSampleBufferDelegate
to capture the audio data, which is triggering the delgate function captureOutput
. Function below:
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
autoreleasepool {
guard captureOutput != nil,
sampleBuffer != nil,
connection != nil,
CMSampleBufferDataIsReady(sampleBuffer) else { return }
//Check this is AUDIO (and not VIDEO) being received
if (connection.audioChannels.count > 0)
{
//Determine number of frames in buffer
var numFrames = CMSampleBufferGetNumSamples(sampleBuffer)
//Get AudioBufferList
var audioBufferList = AudioBufferList(mNumberBuffers: 1, mBuffers: AudioBuffer(mNumberChannels: 0, mDataByteSize: 0, mData: nil))
var blockBuffer: CMBlockBuffer?
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(sampleBuffer, nil, &audioBufferList, MemoryLayout<AudioBufferList>.size, nil, nil, UInt32(kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment), &blockBuffer)
let audioBuffers = UnsafeBufferPointer<AudioBuffer>(start: &audioBufferList.mBuffers, count: Int(audioBufferList.mNumberBuffers))
for audioBuffer in audioBuffers {
let data = Data(bytes: audioBuffer.mData!, count: Int(audioBuffer.mDataByteSize))
let i16array = data.withUnsafeBytes {
UnsafeBufferPointer<Int16>(start: $0, count: data.count/2).map(Int16.init(bigEndian:))
}
for dataItem in i16array
{
print(dataItem)
}
}
}
}
}
The code above prints positive and negative numbers of type Int16
as expected, but need help in converting these raw numbers into meaningful data such as power and decibels for my visualizer.
I was on the right track... Thanks to RobertHarvey's comment on my question - Use of the Accelerate Framework's FFT calculation functions is required to achieve a spectrum analyzer. But even before I could use these functions, you need to convert your raw data into an Array
of type Float
as many of the functions require a Float
array.
Firstly, we load the raw data into a Data
object:
//Read data from AudioBuffer into a variable
let data = Data(bytes: audioBuffer.mData!, count: Int(audioBuffer.mDataByteSize))
I like to think of a Data
object as a "list" of 1-byte sized chunks of info (8 bits each), but if I check the number of frames I have in my sample and the total size of my Data
object in bytes, they don't match:
//Get number of frames in sample and total size of Data
var numFrames = CMSampleBufferGetNumSamples(sampleBuffer) //= 1024 frames in my case
var dataSize = audioBuffer.mDataByteSize //= 2048 bytes in my case
The total size (in bytes) of my data is twice the number of frames I have in my CMSampleBuffer
. This means that each frame of audio is 2 bytes in length. In order to read the data meaningfully, I need to convert my Data
object which is a "list" of 1-byte chunks into an array of 2-byte chunks. Int16
contains 16 bits (or 2 bytes - exactly what we need), so lets create an Array
of Int16
:
//Convert to Int16 array
let samples = data.withUnsafeBytes {
UnsafeBufferPointer<Int16>(start: $0, count: data.count / MemoryLayout<Int16>.size)
}
Now that we have an Array
of Int16
, we can convert it to an Array
of Float
:
//Convert to Float Array
let factor = Float(Int16.max)
var floats: [Float] = Array(repeating: 0.0, count: samples.count)
for i in 0..<samples.count {
floats[i] = Float(samples[i]) / factor
}
Now that we have our Float
array, we can now use the Accelerate Framework's complex math to convert the raw Float
values into meaningful ones like magnitude, decibels etc. Link to documentation:
Apple's Accelerate Framework
Fast Fourier Transform (FFT)
I found Apple's documentation rather overwhelming. Luckily, I found a really good example online which I was able to re-purpose for my needs, called TempiFFT. Implementation as follows:
//Initiate FFT
let fft = TempiFFT(withSize: numFrames, sampleRate: 44100.0)
fft.windowType = TempiFFTWindowType.hanning
//Pass array of Floats
fft.fftForward(floats)
//I only want to display 20 bands on my analyzer
fft.calculateLinearBands(minFrequency: 0, maxFrequency: fft.nyquistFrequency, numberOfBands: 20)
//Then use a loop to iterate through the bands in your spectrum analyzer
var magnitudeArr = [Float](repeating: Float(0), count: 20)
var magnitudeDBArr = [Float](repeating: Float(0), count: 20)
for i in 0..<20
{
var magnitudeArr[i] = fft.magnitudeAtBand(i)
var magnitudeDB = TempiFFT.toDB(fft.magnitudeAtBand(i))
//..I didn't, but you could perform drawing functions here...
}
Other useful references:
Converting Data into Array of Int16
Converting Array of Int16 to Array of Float