问题:

Here is an idea:

We have web applications with exposed restful APIs which accepts json. Now how about using google speech APIs to take user voice input convert it to text then somehow translate that text to JSONs required by APIs and then call those application APIe with JSON? Is there any. Library to translate text to a specified JSon format? Has anybody used this approach?

回答1:

This is called "intent analysis". There are such libraries, for example RASA

For example you input is "show me chinese restaurants". The output would be

{
  "text": "show me chinese restaurants",
  "intent": "restaurant_search",
  "entities": [
    {
      "start": 8,
      "end": 15,
      "value": "chinese",
      "entity": "cuisine"
    }
  ]
}

Overall it is pretty advanced NLU.

回答2:

According to the Google Speech API the result set is already returned in JSON:

{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "how old is the Brooklyn Bridge",
          "confidence": 0.98267895
        }
      ]
    }
  ]
}

All you would have to do is use JSON.parse and then select whatever you wanted out of the object to put into your specific json format.

I would suggest reading through the Google Speech Documentation