Debugging over-released objects, problem with NSZo

2019-01-18 07:38发布

问题:

EDIT: I have found the cause of this crash! bbum pointed out that buffer overflows are a very common cause for this, so I looked at the only buffer type malloc I had:

closedList = (AINavigationCell **)malloc(baseCells.count * sizeof(AINavigationCell *));

I was later overwriting data past the array's bounds, which should have been much larger than baseCells.count. Thank you bbum!

Question: I have a reproduceable EXC_BAD_ACCESS during NSAutoreleasePool -drain, which seems to indicate that I am over-releasing an object. So I enable NSZombie, but then the program does not crash any more. Nor do I get any info logged to the console. If I turn NSZombie off, the crash comes back. What is the meaning of this? I thought NSZombies were used to tackle exactly this kind of problem. If NSZombie won't help, is there another way to interrogate this over-released object?

Also the crash is not reproduceable on Simulator, which is why I can't use Instruments with NSZombie.

Folowing is the backtrace at point of crash.

#0  0x31ac8bc8 in _cache_fill ()
#1  0x31acaf8e in lookUpMethod ()
#2  0x31ac8780 in _class_lookupMethodAndLoadCache ()
#3  0x31ac859a in objc_msgSendSuper_uncached ()
#4  0x328014f0 in -[__NSArrayReverseEnumerator dealloc] ()
#5  0x327b1f7a in -[NSObject(NSObject) release] ()
#6  0x327b63c8 in CFRelease ()
#7  0x327b58de in _CFAutoreleasePoolPop ()
#8  0x320e132c in NSPopAutoreleasePool ()
#9  0x30899048 in CAPopAutoreleasePool ()
#10 0x30902784 in CA::Display::DisplayLink::dispatch ()
#11 0x309027ea in CA::Display::IOMFBDisplayLink::callback ()
#12 0x30076bfa in IOMobileFramebufferVsyncNotifyFunc ()
#13 0x333dee6a in IODispatchCalloutFromCFMessage ()
#14 0x327e8be6 in __CFMachPortPerform ()
#15 0x327e06fe in __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE1_PERFORM_FUNCTION__ ()
#16 0x327e06c2 in __CFRunLoopDoSource1 ()
#17 0x327d2f7c in __CFRunLoopRun ()
#18 0x327d2c86 in CFRunLoopRunSpecific ()
#19 0x327d2b8e in CFRunLoopRunInMode ()
#20 0x3094a4aa in GSEventRunModal ()
#21 0x3094a556 in GSEventRun ()
#22 0x32c14328 in -[UIApplication _run] ()
#23 0x32c11e92 in UIApplicationMain ()
#24 0x00002556 in main (argc=1, argv=0x2fdff660) at /Users/hyn/Desktop/MyProject-trunk/main.m:14

回答1:

The problem you describe could be one of a couple of things; you may be over-releasing an object or you might be corrupting memory. If you corrupt memory -- corrupt the first few bytes of an object, specifically -- then it can easily manifest as a crash during an autorelease pool drain (or any other message).

That the crash happens on a device, but not the simulator, points to memory corruption, as well. The architecture of the device [ARM] vs. the simulator [i386] is quite different and there are any of a number of issues that may be at play.

Typically, it doesn't manifest itself quite so consistently.

First, post the backtrace of the crash. It might help.

Secondly, do you do any kind of raw malloc calls? Or filling buffers with data? The most common cause of such crashes is running past the end of a buffer.


#0  0x31ac8bc8 in _cache_fill ()
#1  0x31acaf8e in lookUpMethod ()
#2  0x31ac8780 in _class_lookupMethodAndLoadCache ()
#3  0x31ac859a in objc_msgSendSuper_uncached ()
#4  0x328014f0 in -[__NSArrayReverseEnumerator dealloc] ()

(The above was added after the OP fixed the problem, but -- for the archive)

That crash trace is a classic signature of memory corruption. Namely, the isa pointer -- the first pointer's worth of bytes in an object that points at the Class of the instance -- was stomped. This typically happens when the you overrun a buffer of memory in the allocation before the object. If it is just a couple of byte overrun, then the behavior between different platforms may differ since the malloc quanta -- the real size of the allocations (you ask for 90 bytes on one platform and you might get 96. Another? 128) -- differ between platforms and, even, releases.

In particular, the isa was stomped with a value that looked enough like a pointer that the runtime dereferenced the garbage value and then tried to treat the resulting location as the Class's method table.

Any time you see a crash that is a few frames deep into one of the objc_msgSend*() functions, it is quite likely memory corruption and, if so, it will almost always be a buffer overflow.

Since it is easy to do, it is still a good idea to do a test pass with zombie detection to catch the "sometimes it is really just an over-release cases".