I have a situation with a React-based app where I have an input for which I wanted to allow voice input as well. I'm okay making this compatible with Chrome and Firefox only, so I was thinking of using getUserMedia. I know I'll be using Google Cloud's Speech to Text API. However, I have a few caveats:
- I want this to stream my audio data live, not just when I'm done recording. This means that a lot of solutions I've found won't work very well, because it's not sufficient to save the file and then send it out to Google Cloud Speech.
- I don't trust my front end with my Google Cloud API information. Instead, I already have a service running on the back end which has my credentials, and I want to stream the audio (live) to that back end, then from that back end stream to Google Cloud, and then emit updates to my transcript as they come in back to the Front End.
- I already connect to that back end service using socket.io, and I want to manage this entirely via sockets, without having to use Binary.js or anything similar.
Nowhere seems to have a good tutorial on how to do this. What do I do?
First, credit where credit is due: a huge amount of my solution here was created by referencing vin-ni's Google-Cloud-Speech-Node-Socket-Playground project. I had to adapt this some for my React app, however, so I'm sharing a few of the changes I made.
My solution here was composed of four parts, two on the front end and two on the back end.
My front end solution was of two parts:
My back end solution was of two parts:
main.js
file(These don't need to be separated by any means; our
main.js
file is just already a behemoth without it.)Most of my code will just be excerpted, but my utilities will be shown in full because I had a lot of problem with all of the stages involved. My front end utility file looked like this:
The main salient point of this code (aside from the getUserMedia configuration, which was in and of itself a bit dicey) is that the
onaudioprocess
callback for the processor emittedspeechData
events to the socket with the data after converting it to Int16. My main changes here from my linked reference above were to replace all of the functionality to actually update the DOM with callback functions (used by my React component) and to add some error handling that wasn't included in the source.I was then able to access this in my React Component by just using:
(I passed in my actual data handler as a prop to this component).
Then on the back end, my service handled three main events in
main.js
:My speechToTextUtils then looked like:
(Again, you don't strictly need this util file, and you could certainly put the
speechClient
as a const on top of the file depending on how you get your credentials; this is just how I implemented it.)And that, finally, should be enough to get you started on this. I encourage you to do your best to understand this code before you reuse or modify it, as it may not work 'out of the box' for you, but unlike all other sources I have found, this should get you at least started on all involved stages of the project. It is my hope that this answer will prevent others from suffering like I have suffered.