Voice recognition fails to work when the voice is

2019-03-27 08:31发布

问题:

I am working on a function that when a button is pressed, it will launch voice recognition and at the same time will record what the user says. Codes as follows:

    button_start.setOnTouchListener( new View.OnTouchListener() 
    {
        @Override
        public boolean onTouch(View arg0, MotionEvent event) 
        {   
                if (pressed == false)
                {
                    Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);        
                    intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
                    intent.putExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE,"voice.recognition.test");
                    intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, "zh-HK");
                    intent.putExtra(RecognizerIntent.EXTRA_MAX_RESULTS,1); 
                    sr.startListening(intent);
                    Log.i("111111","11111111");
                    pressed = true;
                }

                recordAudio();

            }

            if((event.getAction()==MotionEvent.ACTION_UP || event.getAction()==MotionEvent.ACTION_CANCEL))
            {                   
                stopRecording();
            }
            return false;
        }
    });             
}

   public void recordAudio()
   {
      isRecording = true;   
      try 
      {
          mediaRecorder = new MediaRecorder();
          mediaRecorder.setAudioSource(MediaRecorder.AudioSource.MIC);
          mediaRecorder.setOutputFormat(MediaRecorder.OutputFormat.THREE_GPP);
          mediaRecorder.setOutputFile(audioFilePath);
          mediaRecorder.setAudioEncoder(MediaRecorder.AudioEncoder.AMR_NB);
          mediaRecorder.prepare();
      } 
      catch (Exception e) 
      {
          e.printStackTrace();
      }
      mediaRecorder.start();            
   }    

   public void stopRecording()
   {            
       if (isRecording)
       {    
           mediaRecorder.stop();
           mediaRecorder.reset();    // set state to idle
           mediaRecorder.release();
           mediaRecorder = null;
           isRecording = false;
       }
       else 
       {
           mediaPlayer.release();
           mediaPlayer.reset();
           mediaPlayer = null;
       }
   }




class listener implements RecognitionListener          
{
    // standard codes onReadyForSpeech, onBeginningOfSpeech, etc
}

Questions:

I have made the app step by step, and at first the app does not have recording functions, and the voice recognition works perfectly.

After I have tested many times and considered the voice recognition is ok, I start to incorporate the recording functions using the MediaRecorder.

I then tested, once the button_start is pressed, ERROR3 AUDIO message immediately appears even before I tried to speak.

I play back the voice recording. The voice is recorded and saved properly too.

What is happening? Why Cannot recording at the same time when using voice recognition?

Thanks!

回答1:

--EDIT-- module for Opus-Record WHILE Speech-Recognition also runs

--EDIT-- 'V1BETA1' streaming, continuous, recognition with minor change to sample project. Alter that 'readData()', so the raw PCM in 'sData' is shared by 2 threads ( fileSink thread , recognizerAPI thread from sample project). For the sink, just hook up an encoder using a PCM stream refreshed at each 'sData' IO. remember to CLO the stream and it will work. review 'writeAudiaDataToFile()' for more on fileSink....

--EDIT-- see this thread

There is going to be a basic conflict over the HAL and the microphone buffer when you try to do:

speechRecognizer.startListening(recognizerIntent); // <-- needs mutex use of mic

and

mediaRecorder.start(); // <-- needs mutex use of mic

You can only choose one or the other of the above actions to own the audio API's underlying the mic!

If you want to mimic the functionality of Google Keep where you talk only once and as output from the one input process (your speech into mic) you get 2 separate types of output (STT and a fileSink of say the MP3) then you must split something as it exits the HAL layer from the mic.

For example:

  1. Pick up the RAW audio as PCM 16 coming out of the mic's buffer

  2. Split the above buffer's bytes (you can get a stream from the buffer and pipe the stream 2 places)

  3. STRM 1 to the API for STT either before or after you encode it (there are STT APIs accepting both Raw PCM 16 or encoded)

  4. STRM 2 to an encoder, then to the fileSink for your capture of the recording

Split can operate on either the actual buffer produced by the mic or on a derivative stream of those same bytes.

For what you are getting into, I recommend you look at getCurrentRecording() and consumeRecording() here.

STT API reference: Google "pultz speech-api". Note that there are use-cases on the API's mentioned there.

  • buferUtils
  • code
  • more code