I'm using the Google Cloud API for Speech-to-text, with a NodeJS back-end. The app needs to be able to listen for voice commands, and transmit them to the back-end as a buffer. For this, I need to send the buffer of the preceding audio when silence is detected.
Any help would be appreciated. Including the js code below
if (!navigator.getUserMedia)
navigator.getUserMedia = navigator.getUserMedia || navigator.webkitGetUserMedia ||
navigator.mozGetUserMedia || navigator.msGetUserMedia;
if (navigator.getUserMedia) {
navigator.getUserMedia({audio: true}, success, function (e) {
alert('Error capturing audio.');
});
} else alert('getUserMedia not supported in this browser.');
var recording = false;
window.startRecording = function () {
recording = true;
};
window.stopRecording = function () {
recording = false;
// window.Stream.end();
};
function success(e) {
audioContext = window.AudioContext || window.webkitAudioContext;
context = new audioContext();
// the sample rate is in context.sampleRate
audioInput = context.createMediaStreamSource(e);
var bufferSize = 4096;
recorder = context.createScriptProcessor(bufferSize, 1, 1);
recorder.onaudioprocess = function (e) {
if (!recording) return;
console.log('recording');
var left = e.inputBuffer.getChannelData(0);
console.log(convertoFloat32ToInt16(left));
};
audioInput.connect(recorder);
recorder.connect(context.destination);
}
I'm not too sure as to what exactly is being asked in the question, so this answer is only intended to give a way to detect silences in an AudioStream.
To detect silence in an AudioStream, you can use an AudioAnalyser node, on which you will call the
getByteFrequencyData
method at regular intervals, and check whether there were sounds higher than than your expected level for a given time.You can set the threshold level directly with the
minDecibels
property of the AnalyserNode.And as a fiddle since stackSnippets may block gUM.
You can use
SpeechRecognition
result
event to determine when a word or phrase has been recognized, for example,ls
,cd
,pwd
or other commands, pass the.transcript
ofSpeechRecognitionAlternative
tospeechSynthesis.speak()
where at attachedstart
andend
event ofSpeechSynthesisUtterance
call.start()
or.resume()
onMediaRecorder
object whereMediaStream
is passed; convert theBlob
atdataavailable
event to anArrayBuffer
usingFileReader
orResponse.arrayBuffer()
.We could alternatively use
audiostart
orsoundstart
withaudioend
orsoundend
events ofSpeechRecognition
to record the users' actual voice, though the ends may not be fired consistently in relation to the actual start and end of audio captured by only a standard system microphone.plnkr https://plnkr.co/edit/4DVEg6mhFRR94M5gdaIp?p=preview
The simplest approach would be to use
.pause()
and.resume()
,.stop()
methods ofMediaRecorder()
to allow user to start, pause, and stop recording audio captured utilizingnavigator.mediaDevices.getUserMedia()
and convert the resultingBlob
to anArrayBuffer
, if that is what the api is expecting to bePOST
ed to serverplnkr https://plnkr.co/edit/7caWYMsvub90G6pwDdQp?p=preview