Reduce Peak Memory Usage With @autoreleasepool

2019-02-02 17:10发布

I work on an iPad application that has a sync process that uses web services and Core Data in a tight loop. To reduce the memory footprint according to Apple's Recomendation I allocate and drain an NSAutoreleasePool periodically. This currently works great and there are no memory issues with the current application. However, I plan on moving to ARC where the NSAutoreleasePool is no longer valid and would like to maintain this same kind of performance. I created a few examples and timed them and I am wondering what is the best approach, using ARC, to acheive the same kind of performance and maintain code readability.

For testing purposes I came up with 3 scenarios, each create a string using a number between 1 and 10,000,000. I ran each example 3 times to determine how long they took using a Mac 64 bit application with the Apple LLVM 3.0 compiler (w/o gdb -O0) and XCode 4.2. I also ran each example through instruments to see roughly what the memory peak was.

Each of the examples below are contained within the following code block:

int main (int argc, const char * argv[])
{
    @autoreleasepool {
        NSDate *now = [NSDate date];

        //Code Example ...

        NSTimeInterval interval = [now timeIntervalSinceNow];
        printf("Duration: %f\n", interval);
    }
}

NSAutoreleasePool Batch [Original Pre-ARC] (Peak Memory: ~116 KB)

    static const NSUInteger BATCH_SIZE = 1500;
    NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
    for(uint32_t count = 0; count < MAX_ALLOCATIONS; count++)
    {
        NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
        [text class];

        if((count + 1) % BATCH_SIZE == 0)
        {
            [pool drain];
            pool = [[NSAutoreleasePool alloc] init];
        }
    }
    [pool drain];

Run Times:
10.928158
10.912849
11.084716


Outer @autoreleasepool (Peak Memory: ~382 MB)

    @autoreleasepool {
        for(uint32_t count = 0; count < MAX_ALLOCATIONS; count++)
        {
            NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
            [text class];
        }
    }

Run Times:
11.489350
11.310462
11.344662


Inner @autoreleasepool (Peak Memory: ~61.2KB)

    for(uint32_t count = 0; count < MAX_ALLOCATIONS; count++)
    {
        @autoreleasepool {
            NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
            [text class];
        }
    }

Run Times:
14.031112
14.284014
14.099625


@autoreleasepool w/ goto (Peak Memory: ~115KB)

    static const NSUInteger BATCH_SIZE = 1500;
    uint32_t count = 0;

    next_batch:
    @autoreleasepool {
        for(;count < MAX_ALLOCATIONS; count++)
        {
            NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
            [text class];
            if((count + 1) % BATCH_SIZE == 0)
            {
                count++; //Increment count manually
                goto next_batch;
            }
        }
    }

Run Times:
10.908756
10.960189
11.018382

The goto statement offered the closest performance, but it uses a goto. Any thoughts?

Update:

Note: The goto statement is a normal exit for an @autoreleasepool as stated in the documentation and will not leak memory.

On entry, an autorelease pool is pushed. On normal exit (break, return, goto, fall-through, and so on) the autorelease pool is popped. For compatibility with existing code, if exit is due to an exception, the autorelease pool is not popped.

2条回答
Deceive 欺骗
2楼-- · 2019-02-02 17:38

Note that ARC enables significant optimizations which are not enabled at -O0. If you're going to measure performance under ARC, you must test with optimizations enabled. Otherwise, you'll be measuring your hand-tuned retain/release placement against ARC's "naive mode".

Run your tests again with optimizations and see what happens.

Update: I was curious, so I ran it myself. These are the runtime results in Release mode (-Os), with 7,000,000 allocations.

arc-perf[43645:f803] outer: 8.1259
arc-perf[43645:f803] outer: 8.2089
arc-perf[43645:f803] outer: 9.1104

arc-perf[43645:f803] inner: 8.4817
arc-perf[43645:f803] inner: 8.3687
arc-perf[43645:f803] inner: 8.5470

arc-perf[43645:f803] withGoto: 7.6133
arc-perf[43645:f803] withGoto: 7.7465
arc-perf[43645:f803] withGoto: 7.7007

arc-perf[43645:f803] non-ARC: 7.3443
arc-perf[43645:f803] non-ARC: 7.3188
arc-perf[43645:f803] non-ARC: 7.3098

And the memory peaks (only run with 100,000 allocations, because Instruments was taking forever):

Outer: 2.55 MB
Inner: 723 KB
withGoto: ~747 KB
Non-ARC: ~748 KB

These results surprise me a little. Well, the memory peak results don't; it's exactly what you'd expect. But the run time difference between inner and withGoto, even with optimizations enabled, is higher than what I would anticipate.

Of course, this is somewhat of a pathological micro-test, which is very unlikely to model real-world performance of any application. The takeaway here is that ARC may indeed some amount of overhead, but you should always measure your actual application before making assumptions.

(Also, I tested @ipmcc's answer using nested for loops; it behaved almost exactly like the goto version.)

查看更多
聊天终结者
3楼-- · 2019-02-02 17:51

The following should achieve the same thing as the goto answer without the goto:

for (NSUInteger count = 0; count < MAX_ALLOCATIONS;)
{
    @autoreleasepool
    {
        for (NSUInteger j = 0; j < BATCH_SIZE && count < MAX_ALLOCATIONS; j++, count++)
        {
            NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
            [text class];
        }
    }
}
查看更多
登录 后发表回答