-->

How can I obtain the raw audio frames from the mic

2020-04-08 14:17发布

问题:

I am trying to extract MFCC vectors from the audio signal as input into a recurrent neural network. However, I am having trouble figuring out how to obtain the raw audio frames in Swift using Core Audio. Presumably, I have to go low-level to get that data, but I cannot find helpful resources in this area.

How can I get the audio signal information that I need using Swift?

Edit: This question was flagged as a possible duplicate of How to capture audio samples in iOS with Swift?. However, that particular question does not have the answer that I am looking for. Namely, the solution to that question is the creation of an AVAudioRecorder, which is a component, not the end result, of a solution to my question.

This question How to convert WAV/CAF file's sample data to byte array? is more in the direction of where I am headed. The solutions to that are written in Objective-C, and I am wondering if there is a way to do it in Swift.

回答1:

Attaching a tap to the default input node on AVAudioEngine is pretty straightforward and will get you real-time ~100ms chunks of audio from the microphone as Float32 arrays. You don't even have to connect any other audio units. If your MFCC extractor & network are sufficiently responsive this may be the easiest way to go.

let audioEngine = AVAudioEngine()
if let inputNode = audioEngine.inputNode {
    inputNode.installTap( onBus: 0,         // mono input
                          bufferSize: 1000, // a request, not a guarantee
                          format: nil,      // no format translation
                          block: { buffer, when in 

        // This block will be called over and over for successive buffers 
        // of microphone data until you stop() AVAudioEngine
        let actualSampleCount = Int(buffer.frameLength)

        // buffer.floatChannelData?.pointee[n] has the data for point n
        var i=0
        while (i < actualSampleCount) {
            let val = buffer.floatChannelData?.pointee[i]
            // do something to each sample here...
            i += 1
        }
    })

    do {
        try audioEngine.start()
    } catch let error as NSError {
        print("Got an error starting audioEngine: \(error.domain), \(error)")
    }
}

You will need to request and obtain microphone permission as well.

I find the amplitudes to be rather low, so you may need to apply some gain or normalization depending on your network's needs.

To process your WAV files, I'd try AVAssetReader, though I don't have code at hand for that.