I'm using speex to encode some audio data and send it over UDP, and decode it on the other side.
I ran a few tests with speex, and noticed that if I decode a packet straight after I encoded it, the decoded data is in no way close to the original data. Most of the bytes at the start of the buffer are 0.
So when I decode the audio sent over UDP, all I get is noise.
This is how I am encoding the audio:
bool AudioEncoder::encode( float *raw, char *encoded_bits )
{
for ( size_t i = 0; i < 256; i++ )
this->_rfdata[i] = raw[i];
speex_bits_reset(&this->_bits);
speex_encode(this->_state, this->_rfdata, &this->_bits);
int bytesWritten = speex_bits_write(&this->_bits, encoded_bits, 512);
if (bytesWritten)
return true;
return false;
}
this is how I am decoding the audio:
float *f = new float[256];
// recvbuf is the buffer I pass to my recv function on the socket
speex_bits_read_from(&this->_bits, recvbuf, 512);
speex_decode(this->state, &this->_bits, f);
I've check out the docs, and most of my code comes from the example encoding/decoding sample from the speex website.
I'm not sure what I'm missing here.
I found the reason the encoded data was so different. There is the fact it's a lossy compression as Paulo Scardine said, and also that speex only works with 160 frames, so when getting data from portaudio to speex, it needs to be by "packets" of 160 frames.
Actually speaks introduces an additional delay to the audio data, I found out by reverse enginiering:
narrow band : delay = 200 - framesize + lookahead = 200 - 160 + 40 = 80 samples
wide band : delay = 400 - framesize + lookahead = 400 - 320 + 143 = 223 samples
uwide band : delay = 800 - framesize + lookahead = 800 - 640 + 349 = 509 samples
Since the lookahead is initialized with zereos, you observe the first few samples to be "close to zero".
To get the timing right, you must skip those samples before you get the actual audio data you have feeded into the codec. Why that is, I dont know. Probalby the author of speex never cared about this since speex is for streaming, not primarily for storing and restoring audio data.
Another workaround (to not waste space) is, you feed (framesize-delay) zeroes into the codec, before feeding your actual audio data, and then dropping the entire first speex-frame.
I hope this clarifies everything. If someone familiar with Speex reads this, feel free to correct me if I am wrong.
EDIT: Actually, decoder and encoder have both a lookahead time. The actual formula for the delay is:
narrow band : delay = decoder_lh + encoder_lh = 40 + 40 = 80 samples
wide band : delay = decoder_lh + encoder_lh = 80 + 143 = 223 samples
uwide band : delay = decoder_lh + encoder_lh = 160 + 349 = 509 samples
You may want to have a look here for some simple encoding/decoding:
http://www.speex.org/docs/manual/speex-manual/node13.html#SECTION001310000000000000000
Since you are using UDP you may also work with a jitter buffer to re-order packets and stuff.