How to convert Linear16 PCM wav to G711 8-bit 8-kh

2019-08-30 18:18发布

问题:

I am using NAudio to attempt to convert Linear16 PCM wav files that come out of a 3rd party Text-To-Speech API to G711 8-bit 8-khz MULAW that will work as a telephony prompt. Using techniques found in the library authors documentation and some stack overflow posts and specifically following suggestion to do a 2 step conversion.

dynamic foo = JsonConvert.DeserializeObject<dynamic>(result);

byte[] decoded = Convert.FromBase64String(foo.audioContent.ToString());

WaveFormat newFormat = new WaveFormat(8000, 16, 1);
WaveFormat mulaw = WaveFormat.CreateMuLawFormat(8000, 1);

using (MemoryStream mem = new MemoryStream(decoded))
using (WaveFileReader reader = new WaveFileReader(mem))
using (var conversionStream = new WaveFormatConversionStream(newFormat, reader))
using (var convStream2 = new WaveFormatConversionStream(mulaw, conversionStream))
{
     WaveFileWriter.CreateWaveFile("voiceprompt_downsample_8bit-8khz.wav", convStream2);
     File.WriteAllBytes("voiceprompt_raw.wav", decoded);
}

Unfortunately the resulting audio quality of the converted file is pretty degraded (which is to be expected to a degree). However if I take the exact same source file that I am running through the code above and submit it to the converter at g711.org and select the "BroadWorks Classic (8Khz, Mono, u-law)" option the resulting audio sounds much better (especially note that it is not clipping/crushing the S's in words like "access" and "password" in some of our prompts).

I have confirmed that both audio files (the one I convert with NAudio and the one I generated using g711.org) play fine as prompts through our telephony system.

Wondering if anyone out there with NAudio experience has any suggestions about what I can do differently in NAudio to get the output quality of the converted file to match what I am getting out of the g711.org site?

回答1:

Figured it out myself, issue was I needed to be using one of the other options to resample the audio vs. just using WaveFormatConversionStream. After resampling with MediaFoundationResampler the audio quality was much improved over what I was getting with ACM via WaveFormatConversionStream.

This doc helped me come to that realization...