No output in using wav file input with Microsoft S

2019-07-27 02:32发布

问题:

I am working on a project where i need to use speech recognition to convert a wav file input speech ( conversation ) to text. After trying CMUSPhinx for a while, with terrible results, i am considering using Microsoft SAPI (Speech API) 5.4

I am coding as a Visual basic windows application from visual studio. Here is my code snippet :

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
    '   Dim SAPI
    '   SAPI = CreateObject("sapi.spvoice")
    '   SAPI.Speak(TextBox1.Text)

    ' Create new recognizer
    Dim Recognizer As New SpInprocRecognizer

    ' create input file stream
    InputFile = New SpFileStream
    ' Defaults to open for read-only, and DoEvents false
    InputFile.Open(MY_WAVE_AUDIO_FILENAME)

    ' connect wav audio input to speech recognition engine
    Recognizer.AudioInputStream = InputFile

    ' create recognition context
    RecoContext = Recognizer.CreateRecoContext

    '  AddHandler RecoContext.Recognition, AddressOf RecoContext_Recognition

    ' create grammar
    Grammar = RecoContext.CreateGrammar
    ' ... and load dictation
    Grammar.DictationLoad()
    ' start dictating
    Grammar.DictationSetState(SGDSActive)
End Sub

In the MY_WAVE_AUDIO_FILENAME, i have given the filename with full path. When i run this code on click of the button, i dont get any output. I have used the following recognition method :

 Private Sub RecoContext_Recognition(ByVal StreamNumber As Long, ByVal StreamPosition As Object, ByVal RecognitionType As SpeechRecognitionType, ByVal Result As ISpeechRecoResult)
        ' Log/Report recognized phrase/information
        Console.WriteLine("Reached here......")
        TextBox1.Text = "Text should change"
    End Sub

When i debug the application, flow is not reaching the RecoContext_Recognition method. The input file is a wav file with 16 bits per sample, 30 sec long conversation.

I am using the code mentioned in this link : http://msdn.microsoft.com/en-us/library/ee431813(v=vs.85).aspx

How can i check for the issue ? I had read somewhere that dictation needs training to be given to Speech Recognition engine, if its required in my case too then how can i do that? Also in the link it is mentioned that we need to specify the length of the input file in order to do this, am not sure how to do this as well. Help needed.

回答1:

The sample code is missing a few steps that need to be addressed.

1) Inproc recognizers need to bind an engine before they will do any recognitions at all;

2) The inproc recognizer needs to be set active before it will start processing audio.

You should also consider adding handlers for other events, in particular SPEI_START_SR_STREAM, SPEI_SOUND_START, SPEI_SOUND_END, and SPEI_PHRASE_START to verify that the SR engine is processing audio at all and that it's trying to do some recognition.



标签: sapi