how to do mono to stereo conversion?

2019-07-13 11:59发布

问题:

I am using libswresample to resample from any PCM format to 44.1kHz, 16bit int, stereo.

I was playing around with some audio volume analyzing of the resulting audio stream and I figured out that in case I have 44.1kHz, 16bit int mono as the source, I have roughly the formular:

leftSample = sourceSample / sqrt(2);
rightSample = sourceSample / sqrt(2);

But I was expecting:

leftSample = sourceSample;
rightSample = sourceSample;

(In case the source is stereo, I simply have leftSample = leftSourceSample; rightSample = rightSourceSample;.)

My expectation comes from several sources:

  1. That is how my own straight forward solution would probably have been.
  2. I searched a bit around and other people seem to do the same, e.g. here.
  3. In a very common ReplayGain implementation (the only one I know actually, used basically everywhere, I think initially from mp3gain; one copy can be seen here), it also does it:

    switch ( num_channels) {
    case  1: right_samples = left_samples;
    case  2: break;
    default: return GAIN_ANALYSIS_ERROR;
    }
    

    This is esp. relevant because ReplayGain was calibrated by this implementation using a reference sound (a pink noise, can be downloaded here) which is in mono.

    In the ReplayGain specification, it is also calculated like this (see here).

My confusion raised after I tried to implement ReplayGain myself and I stumbled upon this.

So, some questions:

  1. Why does libswresample do this?
  2. Is this expected in libswresample or a bug? (I'm trying to understand from the source (e.g. here) but I haven't fully understood it all yet.) I added a bug report here.
  3. What is the "right" solution?
  4. What are other players doing?
  5. What is a common soundcard doing if you feed mono samples to it?

(I also posted this question on avp.stackexchange now; maybe that is a better place to ask about this, not sure.)

回答1:

The implementation is one correct implementation of "panning" a mono signal into a stereo field. If you pan, instead all the way left or all the way right you want the signal strength to be the same as if it had been panned in the middle, so panned left would be:

//left panning
leftSample = sourceSample;
rightSample = 0;
//right panning
leftSample = 0;
rightSample = sourceSample;
//center panning (same power as hard left/right conversion/)
leftSample = sourceSample * sqrt(2)/2;
rightSample = sourceSample * sqrt(2)/2;

However, if you are converting from mono to stereo, your intuition is correct. There no reason to lower the level since you wont be comparing centered to panned signals. The best way to go is to leave the signal at full strength:

//mono to stereo conversion
leftSample = sourceSample;
rightSample = sourceSample;

It's also possible that they are leaving some post-s/r conversion gain change, but the level seem arbitrary.