Managing text-to-speech and speech recognition at

2020-07-26 06:23发布

问题:

I'd like my iOS app to use text-to-speech to read to the user some information that it receives from a server, and I'd also like to allow the user to stop such speech by a voice command. I have tried speech recognition frameworks for iOS like OpenEars and I find the problem that it is listening and detecting the information the app itself is "saying" and it intereferes in the recognition of user's voice commands.

Has somebody dealt with this scenario in iOS and found a solution for that? Thanks in advance

回答1:

It is not a trivial thing to implement. Unfortunately iOS and others record the sound which is playing through speaker. The only choice you have is to use the headset. In that case speech recognition can continue listening for input. In Openears recognition is disabled during TTS unless headset is plugged in.

If you still want to implement this feature which is called "barge-in" you have to do the following:

Store the audio you play though microphone
Implement noise cancellation algorithm which effectively will remove the audio from the recording. You can use cross-correlation to find a proper offset in the recording and spectral subtraction to remove the audio.
Recognize the speech in remaining signal.

It is not possible to do that without significant modification of openears sources.

Related question is Android Speech Recognition while music is playing