可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

How do you remove "popping" and "clicking" sounds in audio constructed by concatenating sound tonal sound clips together?

I have this PyAudio code for generating a series of tones:

import time
import math
import pyaudio

class Beeper(object):

    def __init__(self, **kwargs):
        self.bitrate = kwargs.pop('bitrate', 16000)
        self.channels = kwargs.pop('channels', 1)
        self._p = pyaudio.PyAudio()
        self.stream = self._p.open(
            format = self._p.get_format_from_width(1), 
            channels = self.channels, 
            rate = self.bitrate, 
            output = True,
        )
        self._queue = []

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.stream.stop_stream()
        self.stream.close()

    def tone(self, frequency, length=1000, play=False, **kwargs):

        number_of_frames = int(self.bitrate * length/1000.)

        ##TODO:fix pops?
        g = get_generator()
        for x in xrange(number_of_frames):
            self._queue.append(chr(int(math.sin(x/((self.bitrate/float(frequency))/math.pi))*127+128)))

    def play(self):
        sound = ''.join(self._queue)
        self.stream.write(sound)
        time.sleep(0.1)

with Beeper(bitrate=88000, channels=2) as beeper:
    i = 0
    for f in xrange(1000, 800-1, int(round(-25/2.))):
        i += 1
        length = log(i+1) * 250/2./2.
        beeper.tone(frequency=f, length=length)
    beeper.play()

but when the tones changes, there's a distinctive "pop" in the audio, and I'm not sure how to remove it.

At first, I thought the pop was occurring because I was immediately playing each clip, and the time between each playback when I generate the clip was enough of a delay to cause the audio to flatline. However, when I concatenated all the clips into a single string and played that, the pop was still there.

Then, I thought the sine-waves weren't matching at the boundaries for each clip, so I tried to average the first N frames of the current audio clip with the last N frames of the previous clip, but that also had no effect.

What am I doing wrong? How do I fix this?

回答1:

The answer you've written for yourself will do the trick but isn't really the correct way to do this type of thing.

One of the problems is your checking for the "tip" or peak of the sine wave by comparing against 1. Not all sine frequencies will hit that value or may require a large number of cycles to do so.

Mathematically speaking, the peak of the sine is at sin(pi/2 + 2piK) for all integer values of K.

To compute sine for a given frequency you use the formula y = sin(2pi * x * f0/fs) where x is the sample number, f0 is the sine frequency and fs is the sample rate.

For a nice number like 1kHz at 48kHz sample rate, when x=12 then:

sin(2pi * 12 * 1000/48000) = sin(2pi * 12/48) = sin(pi/2) = 1

However at a frequency like 997Hz then the true peak falls a fraction of a sample after sample 12.

sin(2pi * 12 * 997/48000) = 0.99087178042
sin(2pi * 12 * 997/48000) = 0.99998889671
sin(2pi * 12 * 997/48000) = 0.99209828673

A better method of stitching the waveforms together is to keep track of the phase from one tone and use that as the starting phase for the next.

First, for a given frequency you need to figure out the phase increment, notice it is the same as what you are doing with the sample factored out:

phInc = 2*pi*f0/fs

Next, compute the sine and update a variable representing the current phase.

for x in xrange(number_of_frames):
    y = math.sin(self._phase);
    self._phase += phaseInc;

Putting it all together:

def tone(self, frequency, length=1000, play=False, **kwargs):

    number_of_frames = int(self.bitrate * length/1000.)
    phInc = 2*math.pi*frequency/self.bitrate

    for x in xrange(number_of_frames):
        y = math.sin(self._phase)
        _phase += phaseInc;
        self._queue.append(chr(int(y)))

回答2:

My initial suspicion that the individual waveforms weren't aligning was correct, which I confirmed by inspecting in Audacity. My solution was to modify the code to start and stop each waveform on the peak of the sine wave.

def tone(self, frequency, length=1000, play=False, **kwargs):

    number_of_frames = int(self.bitrate * length/1000.)

    record = False
    x = 0
    y = 0
    while 1:
        x += 1
        v = math.sin(x/((self.bitrate/float(frequency))/math.pi))

        # Find where the sin tip starts.
        if round(v, 3) == +1:
            record = True

        if record:
            self._queue.append(chr(int(v*127+128)))
            y += 1
            if y > number_of_frames and round(v, 3) == +1:
                # Always end on the high tip of the sin wave to clips align.
                break

回答3:

If you are concatenating clips of varying attributes, you may hear clicking sound if peaks of two clips at the points of concatenation does not align.

One way to get around this is to do Fade-out at the end of first signal and then fade-in at the beginning of second signal. then continue this pattern through rest of the concatenation process. Check here for details on Fading.

I would try out concatenation in visual tools like Audacity , try Fade-out and fade-in on clips you want to join and play around with timing and settings to get desired results.

Next, I am not sure pyAudio has any easy way of implementation fading, however, if you can , you may want to try pyDub. It provides easy ways to manipulate audio. It has both Fade-in and Fade-out methods as well as cross-fade method, which basically performs both fade in and out in one step.

You can install pydub as pip install pydub

Here is a sample code for pyDub:

from pydub import AudioSegment
from pydub.playback import play

#Load first audio segment
audio1 = AudioSegment.from_wav("SineWave_440Hz.wav")

#Load second audio segment
audio2 = AudioSegment.from_wav("SineWave_150Hz.wav")

# 1.5 second crossfade
combinedAudio= audio1.append(audio2, crossfade=1500)

#Play combined Audio
play(combinedAudio)

Finally, if you really want to get noise / pops cleared at a professional grade, you may want to look at PSOLA (Pitch Synchronous Overlap and Add) . Here one would convert audio signals to frequency domain and then perform PSOLA on chunks to merge the audio with minimum possible noise.

That was long, but hope it helps.