可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

My question is not completely programming-related, but nevertheless I think SO is the right place to ask.

In my program I generate some audio data and save the track to a WAV file. Everything works fine with one sound generator. But now I want to add more generators and mix the generated audio data into one file. Unfortunately it is more complicated than it seems at first sight. Moreover I didn't find much useful information on how to mix a set of audio samples.

So is there anyone who can give me advice?

edit:

I'm programming in C++. But it doesn't matter, since I was interested in the theory behind mixing two audio tracks. The problem I have is that I cannot just sum up the samples, because this often produces distorted sound.

回答1:

I assume your problem is that for every audio source you're adding in, you're having to lower the levels.

If the app gives control to a user, just let them control the levels directly. Hotness is their responsibility, not yours. This is "summing."

If the mixing is automated, you're about to go on a journey. You'll probably need compression, if not limiting. (Limiting is an extreme version of compression.)

Note that anything you do to the audio (including compression and limiting) is a form of distortion, so you WILL have coloration of the audio. Your choice of compression and limiting algorithms will affect the sound.

Since you're not generating the audio in real time, you have the possibility of doing "brick wall" limiting. That's because you have foreknowledge of the levels. Realtime limiting is more limited because you can't know what's coming up--you have to be reactive.

Is this music, sound effects, voices, what?

Programmers here deal with this all the time.

回答2:

Mixing audio samples means adding them together, that's all. Typically you do add them into a larger data type so that you can detect overflow and clamp the values before casting back into your destination buffer. If you know beforehand that you will have overflow then you can scale their amplitudes prior to addition - simply multiply by a floating point value between 0 and 1, again keeping in mind the issue of precision, perhaps converting to a larger data type first.

If you have a specific problem that is not addressed by this, feel free to update your original question.