可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I already asked about audio volume normalization. On most methods (e.g. ReplayGain, which I am most interested in), I might get peaks that exceed the PCM limit (as can also be read here).

Simple clipping would probably be the worst thing I can do. As Wikipedia suggests, I should do some form of dynamic range compression.

I am speaking about the function which I'm applying on each individual PCM sample value. On another similar question, one answer suggests that doing this is not enough or not the thing I should do. However, I don't really understand that as I still have to handle the clipping case. Does the answer suggest to do the range compression on multiple samples at once and do to simple hard clipping in addition on every sample?

Leaving that aside, the functions discussed in the Wikipedia article seem to be somewhat not what I want (in many cases, I would still have the clipping case in the end). I am thinking about using something like tanh. Is that a bad idea? It would reduce the volume slightly but guarantee that I don't get any clipping.

My application is a generic music player. I am searching for a solution which mostly works best for everyone so that I can always turn it on and the user very likely does not want to turn this off.

回答1:

Using any instantaneous dynamic range processing (such as clipping or tanh non-linearity) will introduce audible distortion. Put a sine wave into an instantaneous non-linear function and you no longer have a sine wave. While useful for certain audio applications, it sounds like you do not want these artefacts.

Normalization does not effect the dynamics (in terms of min/max ratio) of a waveform. Normalization involves element-wise multiplication of a waveform by a constant scalar value to ensure no samples ever exceed a maximum value. This process can only by done off-line, as you need to analyse the entire signal before processing. Normalization is also a bad idea if your waveform contains any intense transients. Your entire signal will be attenuated by the ratio of the transient peak value divided by the clipping threshold.

If you just want to protect the output from clipping you are best off using a side chain type compressor. A specific form of this is the limiter (infinite compression ratio above a threshold with zero attack time). A side-chain compressor calculates the smoothed energy envelope of a signal and then applies a varying gain according to that function. They are not instantaneous, so you reduce audible distortion that you'd get from the functions you mention. A limiter can have instantaneous attack to prevent from clipping, but you allow a release time so that the limiter remains attenuating for subsequent waveform peaks, the subsequent waveform is just turned down and so there is no distortion. After the intense sound, the limiter recovers.

You can get a pumping type sound from this type of processing if there are a lot of high intensity peaks in the waveform. If this becomes problematic, you can then move to the next level and do the dynamics processing within sub-bands. This way, only the offending parts of the frequency spectrum will be attenuated, leaving the rest of the sound unaffected.

回答2:

The general solution is to normalize to some gain level significantly below 1 such that very few songs require adding gain. In other words, most of the time you will be lowering the volume of signal rather than increasing. Experiment with a wide variety of songs in different styles to figure out what this level is.

Now, occasionally, you'll still come across a song that requires enough gain that, that, at some point, it would clip. You have two options: 1. don't add that much gain. This one song will sound a bit quieter. C'est la vie. (this is a common approach), or 2. apply a small amount of dynamic range compression and/or limiting. Of course, you can also do some combination 1 and 2. I believe iTunes uses a combination of 1 and 2, but they've worked very hard on #2, and they apply very little.

Your suggestion, using a function like tanh, on a sample-by-sample basis, will result in audible distortion. You don't want to do this for a generic music player. This is the sort of thing that's done in guitar amp simulators to make them sound "dirty" and "grungy". It might not be audible in rock, pop, or other modern music which is heavy on distortion already, but on carefully recorded choral, jazz or solo violin music people will be upset. This has nothing to do with the choice of tanh, by the way, any nonlinear function will produce distortion.

Dynamic range compression uses envelopes that are applied over time to the signal: http://en.wikipedia.org/wiki/Dynamic_range_compression This is tricky to get right, and you can never create a compressor that is truly "transparent". A limiter can be thought of as an extreme version of a compressor that (at least in theory) prevents signal from going above a certain level. A digital "lookahead" limiter can do so without noticeable clipping. When judiciously used, it is pretty transparent.

If you take this approach, make sure that this feature can be turned off, because no matter how transparent you think it is, someone will hear it and not like it.