Mixing down two files together using Extended Audi

I am doing some custom audio post-processing using audio units. I have two files that I am merging together (links below), but am coming up with some weird noise in the output. What am I doing wrong?

I have verified that before this step, the 2 files (workTrack1 and workTrack2) are in a proper state and sound good. No errors are hit in the process as well.

Buffer Processing code:

- (BOOL)mixBuffersWithBuffer1:(const int16_t *)buffer1 buffer2:(const int16_t *)buffer2 outBuffer:(int16_t *)mixbuffer outBufferNumSamples:(int)mixbufferNumSamples {
    BOOL clipping = NO;

    for (int i = 0 ; i < mixbufferNumSamples; i++) {
        int32_t s1 = buffer1[i];
        int32_t s2 = buffer2[i];
        int32_t mixed = s1 + s2;

        if ((mixed < -32768) || (mixed > 32767)) {
            clipping = YES; // don't break here because we dont want to lose data, only to warn the user
        }

        mixbuffer[i] = (int16_t) mixed;
    }
    return clipping;
}

Mixdown code:

////////////////////////////////////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////////////////////////////////////
/////////////////////////////////////////////      PHASE 4      ////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// In phase 4, open workTrack1 and workTrack2 for reading,
// mix together, and write out to outfile.

// open the outfile for writing -- this will erase the infile if they are the same, but its ok cause we are done with it
err = [self openExtAudioFileForWriting:outPath audioFileRefPtr:&outputAudioFileRef numChannels:numChannels];
if (err) { [self cleanupInBuffer1:inBuffer1 inBuffer2:inBuffer2 outBuffer:outBuffer err:err]; return NO; }

// setup vars
framesRead = 0;
totalFrames = [self totalFrames:mixAudioFile1Ref]; // the long one.
NSLog(@"Mix-down phase, %d frames (%0.2f secs)", totalFrames, totalFrames / RECORD_SAMPLES_PER_SECOND);

moreToProcess = YES;
while (moreToProcess) {

    conversionBuffer1.mBuffers[0].mDataByteSize = LOOPER_BUFFER_SIZE;
    conversionBuffer2.mBuffers[0].mDataByteSize = LOOPER_BUFFER_SIZE;

    UInt32 frameCount1 = framesInBuffer;
    UInt32 frameCount2 = framesInBuffer;

    // Read a buffer of input samples up to AND INCLUDING totalFrames
    int numFramesRemaining = totalFrames - framesRead; // Todo see if we are off by 1 here.  Might have to add 1
    if (numFramesRemaining == 0) {
        moreToProcess = NO; // If no frames are to be read, then this phase is finished

    } else {
        if (numFramesRemaining < frameCount1) { // see if we are near the end
            frameCount1 = numFramesRemaining;
            frameCount2 = numFramesRemaining;
            conversionBuffer1.mBuffers[0].mDataByteSize = (frameCount1 * bytesPerFrame);
            conversionBuffer2.mBuffers[0].mDataByteSize = (frameCount2 * bytesPerFrame);
        }

        NSbugLog(@"Attempting to read %d frames from mixAudioFile1Ref", (int)frameCount1);
        err = ExtAudioFileRead(mixAudioFile1Ref, &frameCount1, &conversionBuffer1);
        if (err) { [self cleanupInBuffer1:inBuffer1 inBuffer2:inBuffer2 outBuffer:outBuffer err:err]; return NO; }

        NSLog(@"Attempting to read %d frames from mixAudioFile2Ref", (int)frameCount2);
        err = ExtAudioFileRead(mixAudioFile2Ref, &frameCount2, &conversionBuffer2);
        if (err) { [self cleanupInBuffer1:inBuffer1 inBuffer2:inBuffer2 outBuffer:outBuffer err:err]; return NO; }

        NSLog(@"Read %d frames from mixAudioFile1Ref in mix-down phase", (int)frameCount1);
        NSLog(@"Read %d frames from mixAudioFile2Ref in mix-down phase", (int)frameCount2);

        // If no frames were returned, phase is finished
        if (frameCount1 == 0) {
            moreToProcess = NO;

        } else { // Process pcm data

            // if buffer2 was not filled, fill with zeros
            if (frameCount2 < frameCount1) {
                bzero(inBuffer2 + frameCount2, (frameCount1 - frameCount2));
                frameCount2 = frameCount1;
            }

            const int numSamples = (frameCount1 * bytesPerFrame) / sizeof(int16_t);

            if ([self mixBuffersWithBuffer1:(const int16_t *)inBuffer1
                                    buffer2:(const int16_t *)inBuffer2
                                  outBuffer:(int16_t *)outBuffer
                        outBufferNumSamples:numSamples]) {
                NSLog(@"Clipping");
            }
            // Write pcm data to the main output file
            conversionOutBuffer.mBuffers[0].mDataByteSize = (frameCount1 * bytesPerFrame);
            err = ExtAudioFileWrite(outputAudioFileRef, frameCount1, &conversionOutBuffer);

            framesRead += frameCount1;
        } // frame count
    } // else

    if (err) {
        moreToProcess = NO;
    }
} // while moreToProcess

// Check for errors
TTDASSERT(framesRead == totalFrames);
if (err) {
    if (error) *error = [NSError errorWithDomain:kUAAudioSelfCrossFaderErrorDomain
                                            code:UAAudioSelfCrossFaderErrorTypeMixDown
                                        userInfo:[NSDictionary dictionaryWithObjectsAndKeys:[NSNumber numberWithInt:err],@"Underlying Error Code",[self commonExtAudioResultCode:err],@"Underlying Error Name",nil]];
    [self cleanupInBuffer1:inBuffer1 inBuffer2:inBuffer2 outBuffer:outBuffer err:err];
    return NO;
}
NSLog(@"Done with mix-down phase");

ASSUMPTIONS

mixAudioFile1Ref is always longer than mixAudioFile2Ref
After the mixAudioFile2Ref runs out of bytes, the outputAudioFileRef should sound exactly the same as mixAudioFile2Ref

The expected sound is supposed to be mixing a fade-in over a fade-out in the beginning to produce a self-crossfade when the track is looped. Please listen to the output, look at the code and let me know where I am going wrong.

Source tone sound: http://cl.ly/2g2F2A3k1r3S36210V23
Resulting tone sound: http://cl.ly/3q2w3S3Y0x0M3i2a1W3v

回答1:

Turns out there were two problems here.

Buffer Processing Code

int32_t mixed = s1 + s2; was causing clipping. A better way is to divide by the number of channels mixed:int32_t mixed = (s1 + s2)/2; then normalize in another pass later.

Frames != bytes When zeroing out the second track's buffers when the sound ran out, I was incorrectly setting the offset and duration as frames not bytes. This produced garbage in the buffer and created the noise you hear periodically. Easy to fix:

if (frameCount2 < frameCount1) {
    bzero(inBuffer2 + (frameCount2 * bytesPerFrame), (frameCount1 - frameCount2) * bytesPerFrame);
    frameCount2 = frameCount1;
}

Now the sample is great: http://cl.ly/1E2q1L441s2b3e2X2z0J

回答2:

Your posted answer looks good; I can only see one minor problem. Your solution for the clipping, dividing by two will help but it also is the equivalent of applying a 50% gain reduction. That is not the same as normalization; normalization is the process of looking through an entire audio file, finding the highest peak, and applying a given gain reduction so that this peak hits a certain level (usually 0.0dB). The result is that under normal (ie, non-clipping) circumstances, the output signal will be very low and need to be boosted again.

During your mixdown, you no doubt encountered an overflow which caused distortion, since the value would wrap around and cause a jump in the signal. What you want to do instead is to apply a technique called a "brick-wall limiter", which basically applies a hard ceiling to samples which are clipping. The simplest way to do this is:

int32_t mixed = s1 + s2;
if(mixed >= 32767) {
  mixed = 32767;
}
else if(mixed <= -32767) {
  mixed = -32767;
}

The result of this technique is that you will hear a bit of distortion around samples which are clipping, but the sound will not be completely mangled as would be the case with integer overflow. The distortion, although present, doesn't destroy the listening experience.