I work on an iPad application that has a sync process that uses web services and Core Data in a tight loop. To reduce the memory footprint according to Apple's Recomendation I allocate and drain an NSAutoreleasePool
periodically. This currently works great and there are no memory issues with the current application. However, I plan on moving to ARC where the NSAutoreleasePool
is no longer valid and would like to maintain this same kind of performance. I created a few examples and timed them and I am wondering what is the best approach, using ARC, to acheive the same kind of performance and maintain code readability.
For testing purposes I came up with 3 scenarios, each create a string using a number between 1 and 10,000,000. I ran each example 3 times to determine how long they took using a Mac 64 bit application with the Apple LLVM 3.0 compiler (w/o gdb -O0) and XCode 4.2. I also ran each example through instruments to see roughly what the memory peak was.
Each of the examples below are contained within the following code block:
int main (int argc, const char * argv[])
{
@autoreleasepool {
NSDate *now = [NSDate date];
//Code Example ...
NSTimeInterval interval = [now timeIntervalSinceNow];
printf("Duration: %f\n", interval);
}
}
NSAutoreleasePool Batch [Original Pre-ARC] (Peak Memory: ~116 KB)
static const NSUInteger BATCH_SIZE = 1500;
NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
for(uint32_t count = 0; count < MAX_ALLOCATIONS; count++)
{
NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
[text class];
if((count + 1) % BATCH_SIZE == 0)
{
[pool drain];
pool = [[NSAutoreleasePool alloc] init];
}
}
[pool drain];
Run Times:
10.928158
10.912849
11.084716
Outer @autoreleasepool (Peak Memory: ~382 MB)
@autoreleasepool {
for(uint32_t count = 0; count < MAX_ALLOCATIONS; count++)
{
NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
[text class];
}
}
Run Times:
11.489350
11.310462
11.344662
Inner @autoreleasepool (Peak Memory: ~61.2KB)
for(uint32_t count = 0; count < MAX_ALLOCATIONS; count++)
{
@autoreleasepool {
NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
[text class];
}
}
Run Times:
14.031112
14.284014
14.099625
@autoreleasepool w/ goto (Peak Memory: ~115KB)
static const NSUInteger BATCH_SIZE = 1500;
uint32_t count = 0;
next_batch:
@autoreleasepool {
for(;count < MAX_ALLOCATIONS; count++)
{
NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
[text class];
if((count + 1) % BATCH_SIZE == 0)
{
count++; //Increment count manually
goto next_batch;
}
}
}
Run Times:
10.908756
10.960189
11.018382
The goto
statement offered the closest performance, but it uses a goto
. Any thoughts?
Update:
Note: The goto
statement is a normal exit for an @autoreleasepool as stated in the documentation and will not leak memory.
On entry, an autorelease pool is pushed. On normal exit (break, return, goto, fall-through, and so on) the autorelease pool is popped. For compatibility with existing code, if exit is due to an exception, the autorelease pool is not popped.
Note that ARC enables significant optimizations which are not enabled at
-O0
. If you're going to measure performance under ARC, you must test with optimizations enabled. Otherwise, you'll be measuring your hand-tuned retain/release placement against ARC's "naive mode".Run your tests again with optimizations and see what happens.
Update: I was curious, so I ran it myself. These are the runtime results in Release mode (-Os), with 7,000,000 allocations.
And the memory peaks (only run with 100,000 allocations, because Instruments was taking forever):
These results surprise me a little. Well, the memory peak results don't; it's exactly what you'd expect. But the run time difference between
inner
andwithGoto
, even with optimizations enabled, is higher than what I would anticipate.Of course, this is somewhat of a pathological micro-test, which is very unlikely to model real-world performance of any application. The takeaway here is that ARC may indeed some amount of overhead, but you should always measure your actual application before making assumptions.
(Also, I tested @ipmcc's answer using nested for loops; it behaved almost exactly like the
goto
version.)The following should achieve the same thing as the
goto
answer without thegoto
: