可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am making an Android-to-Android VoIP (loudspeaker) app using its AudioRecord and AudioTrack class, along with Speex via NDK to do echo cancellation. I was able to successfully pass into and retrieve data from Speex's speex_echo_cancellation() function, but the echo remains.

Here is the relevant android thread code that is recording/sending and receiving/playing audio:

//constructor
public MyThread(DatagramSocket socket, int frameSize, int filterLength){
  this.socket = socket;
  nativeMethod_initEchoState(frameSize, filterLength);
}

public void run(){

  short[] audioShorts, recvShorts, recordedShorts, filteredShorts;
  byte[] audioBytes, recvBytes;
  int shortsRead;
  DatagramPacket packet;

  //initialize recorder and player
  int samplingRate = 8000;
  int managerBufferSize = 2000;
  AudioTrack player = new AudioTrack(AudioManager.STREAM_MUSIC, samplingRate, AudioFormat.CHANNEL_OUT_MONO, AudioFormat.ENCODING_PCM_16BIT, managerBufferSize, AudioTrack.MODE_STREAM);
  recorder = new AudioRecord(MediaRecorder.AudioSource.MIC, samplingRate, AudioFormat.CHANNEL_IN_MONO, AudioFormat.ENCODING_PCM_16BIT, managerBufferSize);
  recorder.startRecording();
  player.play();

  //record first packet
  audioShorts = new short[1000];
  shortsRead = recorder.read(audioShorts, 0, audioShorts.length);

  //convert shorts to bytes to send
  audioBytes = new byte[shortsRead*2];
  ByteBuffer.wrap(audioBytes).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().put(audioShorts);

  //send bytes
  packet = new DatagramPacket(audioBytes, audioBytes.length);
  socket.send(packet);

  while (!this.isInterrupted()){

    //recieve packet/bytes (received audio data should have echo cancelled already)
    recvBytes = new byte[2000];
    packet = new DatagramPacket(recvBytes, recvBytes.length);
    socket.receive(packet);

    //convert bytes to shorts
    recvShorts = new short[packet.getLength()/2];
    ByteBuffer.wrap(packet.getData(), 0, packet.getLength()).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().get(recvShorts);

    //play shorts
    player.write(recvShorts, 0, recvShorts.length);

    //record shorts
    recordedShorts = new short[1000];
    shortsRead = recorder.read(recordedShorts, 0, recordedShorts.length);

    //send played and recorded shorts into speex, 
    //returning audio data with the echo removed
    filteredShorts = nativeMethod_speexEchoCancel(recordedShorts, recvShorts);

    //convert filtered shorts to bytes
    audioBytes = new byte[shortsRead*2];
    ByteBuffer.wrap(audioBytes).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().put(filteredShorts);

    //send off bytes
    packet = new DatagramPacket(audioBytes, audioBytes.length);
    socket.send(packet);                

  }//end of while loop 

}

Here is the relevant NDK / JNI code:

void nativeMethod_initEchoState(JNIEnv *env, jobject jobj, jint frameSize, jint filterLength){
  echo_state = speex_echo_state_init(frameSize, filterLength);
}

jshortArray nativeMethod_speexEchoCancel(JNIEnv *env, jobject jObj, jshortArray input_frame, jshortArray echo_frame){

  //create native shorts from java shorts
  jshort *native_input_frame = (*env)->GetShortArrayElements(env, input_frame, NULL);
  jshort *native_echo_frame = (*env)->GetShortArrayElements(env, echo_frame, NULL);

  //allocate memory for output data
  jint length = (*env)->GetArrayLength(env, input_frame);
  jshortArray temp = (*env)->NewShortArray(env, length);
  jshort *native_output_frame = (*env)->GetShortArrayElements(env, temp, 0);

  //call echo cancellation
  speex_echo_cancellation(echo_state, native_input_frame, native_echo_frame, native_output_frame);

  //convert native output to java layer output
  jshortArray output_shorts = (*env)->NewShortArray(env, length);
  (*env)->SetShortArrayRegion(env, output_shorts, 0, length, native_output_frame);

  //cleanup and return
  (*env)->ReleaseShortArrayElements(env, input_frame, native_input_frame, 0);
  (*env)->ReleaseShortArrayElements(env, echo_frame, native_echo_frame, 0);
  (*env)->ReleaseShortArrayElements(env, temp, native_output_frame, 0);
  return output_shorts;
}

These code runs fine and audio data is definitely being sent/received/processed/played from android-to-android. Given audio sample rate of 8000 Hz and packet size of 2000bytes/1000shorts, I've found that a frameSize of 1000 is needed in order for the played audio to be smooth. Most value of filterLength (aka tail length according to Speex doc) will run, but seems to have no effect on the echo removal.

Does anyone understand enough AEC as to provide me some pointers on implementing or configuring Speex? Thanks for reading.

回答1:

Your code is right but missing something in native codes, I modified init method and added speex preprocess after echo cancellation, then your code worked well (I tried in windows) Here is Native Code

#include <jni.h>
#include "speex/speex_echo.h"
#include "speex/speex_preprocess.h"
#include "EchoCanceller_jniHeader.h"
SpeexEchoState *st;
SpeexPreprocessState *den;

JNIEXPORT void JNICALL Java_speex_EchoCanceller_open
  (JNIEnv *env, jobject jObj, jint jSampleRate, jint jBufSize, jint jTotalSize)
{
     //init
     int sampleRate=jSampleRate;
     st = speex_echo_state_init(jBufSize, jTotalSize);
     den = speex_preprocess_state_init(jBufSize, sampleRate);
     speex_echo_ctl(st, SPEEX_ECHO_SET_SAMPLING_RATE, &sampleRate);
     speex_preprocess_ctl(den, SPEEX_PREPROCESS_SET_ECHO_STATE, st);
}

JNIEXPORT jshortArray JNICALL Java_speex_EchoCanceller_process
  (JNIEnv * env, jobject jObj, jshortArray input_frame, jshortArray echo_frame)
{
  //create native shorts from java shorts
  jshort *native_input_frame = (*env)->GetShortArrayElements(env, input_frame, NULL);
  jshort *native_echo_frame = (*env)->GetShortArrayElements(env, echo_frame, NULL);

  //allocate memory for output data
  jint length = (*env)->GetArrayLength(env, input_frame);
  jshortArray temp = (*env)->NewShortArray(env, length);
  jshort *native_output_frame = (*env)->GetShortArrayElements(env, temp, 0);

  //call echo cancellation
  speex_echo_cancellation(st, native_input_frame, native_echo_frame, native_output_frame);
  //preprocess output frame
  speex_preprocess_run(den, native_output_frame);

  //convert native output to java layer output
  jshortArray output_shorts = (*env)->NewShortArray(env, length);
  (*env)->SetShortArrayRegion(env, output_shorts, 0, length, native_output_frame);

  //cleanup and return
  (*env)->ReleaseShortArrayElements(env, input_frame, native_input_frame, 0);
  (*env)->ReleaseShortArrayElements(env, echo_frame, native_echo_frame, 0);
  (*env)->ReleaseShortArrayElements(env, temp, native_output_frame, 0);

  return output_shorts;   
}

JNIEXPORT void JNICALL Java_speex_EchoCanceller_close
  (JNIEnv *env, jobject jObj)
{
     //close
     speex_echo_state_destroy(st);
     speex_preprocess_state_destroy(den);
}

You can find useful samples such as Encoding, Decoding, Echo Cancellation in speex library's source (http://www.speex.org/downloads/)

回答2:

Are you properly aligning the far-end signal (what you call recv) and near end signal (what you call record)? There is always some playback/record latency which needs to be accounted for. This generally requires buffering of the far-end signal in a ring buffer for some specified period of time. On PCs this is usually about 50 - 120ms. On Android I suspect it's much higher. Probably in the range of 150 - 400ms. I would recommend using a 100ms taillength with speex and adjusting the size of your far-end buffer until the AEC converges. These changes should allow the AEC to converge, independently of the inclusion of the preprocessor, which is not required here.