What is `objc_msgSend_fixup`, exactly?

2020-05-27 09:39发布

问题:

I'm messing around with the Objective-C runtime, trying to compile objective-c code without linking it against libobjc, and I'm having some segmentation fault problems with a program, so I generated an assembly file from it. I think it's not necessary to show the whole assembly file. At some point of my main function, I've got the following line (which, by the way, is the line after which I get the seg fault):

callq   *l_objc_msgSend_fixup_alloc

and here is the definition for l_objc_msgSend_fixup_alloc:

.hidden l_objc_msgSend_fixup_alloc # @"\01l_objc_msgSend_fixup_alloc"
    .type   l_objc_msgSend_fixup_alloc,@object
    .section    "__DATA, __objc_msgrefs, coalesced","aw",@progbits
    .weak   l_objc_msgSend_fixup_alloc
    .align  16
l_objc_msgSend_fixup_alloc:
    .quad   objc_msgSend_fixup
    .quad   L_OBJC_METH_VAR_NAME_
    .size   l_objc_msgSend_fixup_alloc, 16

I've reimplemented objc_msgSend_fixup as a function (id objc_msgSend_fixup(id self, SEL op, ...)) which returns nil (just to see what happens), but this function isn't even being called (the program crashes before calling it).

So, my question is, what is callq *l_objc_msgSend_fixup_alloc supposed to do and what is objc_msgSend_fixup (after l_objc_msgSend_fixup_alloc:) supposed to be (a function or an object)?

Edit

To better explain, I'm not linking my source file against the objc library. What I'm trying to do is implement some parts of the libray, just to see how it works. Here is an approach of what I've done:

#include <stdio.h>
#include <objc/runtime.h>

@interface MyClass {

}
+(id) alloc;
@end

@implementation MyClass
+(id) alloc {
  // alloc the object
  return nil;
}
@end

id objc_msgSend_fixup(id self, SEL op, ...) {
  printf("Calling objc_msgSend_fixup()...\n");

  // looks for the method implementation for SEL in self's method list

  return nil;   // Since this is just a test, this function doesn't need to do that
}

int main(int argc, char *argv[]) {
    MyClass *m;
    m = [MyClass alloc];    // At this point, according to the assembly code generated
    // objc_msgSend_fixup should be called. So, the program should, at least, print
    // "Calling objc_msgSend_fixup()..." on the screen, but it crashes before
    // objc_msgSend_fixup() is called...

    return 0;
}

If the runtime needs to access the object's vtable or the method list of the obect's class to find the correct method to call, what is the function which actually does this? I think it is objc_msgSend_fixup, in this case. So, when objc_msgSend_fixup is called, it receives an object as one of its parameters, and, if this object hasn't been initialized, the function fails.

So, I've implemented my own version of objc_msgSend_fixup. According to the assembly source above, it should be called. It doesn't matter if the function is actually looking for the implementation of the selector passed as parameter. I just want objc_msgSend_lookup to be called. But, it's not being called, that is, the function that looks for the object's data is not even being called, instead of being called and cause a fault (because it returns a nil (which, by the way, doesn't matter)). The program seg fails before objc_msgSend_lookup is called...

Edit 2

A more complete assembly snippet:

.globl  main
    .align  16, 0x90
    .type   main,@function
main:                                   # @main
.Ltmp20:
    .cfi_startproc
# BB#0:
    pushq   %rbp
.Ltmp21:
    .cfi_def_cfa_offset 16
.Ltmp22:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
.Ltmp23:
    .cfi_def_cfa_register %rbp
    subq    $32, %rsp
    movl    $0, %eax
    leaq    l_objc_msgSend_fixup_alloc, %rcx
    movl    $0, -4(%rbp)
    movl    %edi, -8(%rbp)
    movq    %rsi, -16(%rbp)
    movq    L_OBJC_CLASSLIST_REFERENCES_$_, %rsi
    movq    %rsi, %rdi
    movq    %rcx, %rsi
    movl    %eax, -28(%rbp)         # 4-byte Spill
    callq   *l_objc_msgSend_fixup_alloc
    movq    %rax, -24(%rbp)
    movl    -28(%rbp), %eax         # 4-byte Reload
    addq    $32, %rsp
    popq    %rbp
    ret

For l_objc_msgSend_fixup_alloc, we have:

.hidden l_objc_msgSend_fixup_alloc # @"\01l_objc_msgSend_fixup_alloc"
    .type   l_objc_msgSend_fixup_alloc,@object
    .section    "__DATA, __objc_msgrefs, coalesced","aw",@progbits
    .weak   l_objc_msgSend_fixup_alloc
    .align  16
l_objc_msgSend_fixup_alloc:
    .quad   objc_msgSend_fixup
    .quad   L_OBJC_METH_VAR_NAME_
    .size   l_objc_msgSend_fixup_alloc, 16

For L_OBJC_CLASSLIST_REFERENCES_$_:

.type   L_OBJC_CLASSLIST_REFERENCES_$_,@object # @"\01L_OBJC_CLASSLIST_REFERENCES_$_"
    .section    "__DATA, __objc_classrefs, regular, no_dead_strip","aw",@progbits
    .align  8
L_OBJC_CLASSLIST_REFERENCES_$_:
    .quad   OBJC_CLASS_$_MyClass
    .size   L_OBJC_CLASSLIST_REFERENCES_$_, 8

OBJC_CLASS_$_MyClass is a pointer to the MyClass struct definition, which has been also generated by the compiler and it's also present in the assembly code.

回答1:

To understand what objc_msgSend_fixup is and what it does it's necessary to know exactly how message sending is performed in Objective-C. All the ObjC programmers have heard one day that the compiler transforms [obj message] statements into objc_msgSend(obj, sel_registerName("message")) calls. However, that's not entirely accurate.

To better ilustrate my explanation, consider the following ObjC snippet:

[obj mesgA];
[obj mesgB];

[obj mesgA];
[obj mesgB];

In this snippet, two messages are sent to obj, each of which is sent twice. So, you might imagine that the following code is generated:

objc_msgSend(obj, sel_registerName("mesgA"));
objc_msgSend(obj, sel_registerName("mesgB"));
objc_msgSend(obj, sel_registerName("mesgA"));
objc_msgSend(obj, sel_registerName("mesgB"));

However sel_registerName may be too costly and call it whenever a specific method is called is not a smart thing to do. Then, the compiler generates structures like this for each message to be sent:

typedef struct message_ref {
    id (*trampoline) (id obj, struct message_ref *ref, ...);
    union {
        const char *str;
        SEL sel;
    };
} message_ref;

So, in the example above, when the program starts, we have something like this:

message_ref l_objc_msgSend_fixup_mesgA = { &objc_msgSend_fixup, "mesgA" };
message_ref l_objc_msgSend_fixup_mesgB = { &objc_msgSend_fixup, "mesgB" };

When these messages need to be sent to obj, the compiler generates code equivalent to the following:

l_objc_msgSend_fixup_mesgA.trampoline(obj, &l_objc_msgSend_fixup_mesgA, ...);   // [obj mesgA];
l_objc_msgSend_fixup_mesgB.trampoline(obj, &l_objc_msgSend_fixup_mesgB, ...);   // [obj mesgB];

At the program startup, the message reference trampolines are pointers to the objc_msgSend_fixup function. For each message_ref, when its trampoline pointer is invoked for the first time, objc_msgSend_fixup gets called receiving the obj to which the message's got to be sent and the message_ref structure from which it was called. So, what objc_msgSend_fixup must do is get the selector for the message to be called. Since, this has to be done only once for each message reference, objc_msgSend_fixup must also replace the trampoline field of the ref by a pointer to another function that doesn't fix the message's selector. This function is called objc_msgSend_fixedup (the selector has been fixed up). Now that the message selector has been set and this doesn't have to be done again, objc_msgSend_fixup just calls objc_msgSend_fixedup and this just calls objc_msgSend. After that, if a message ref's trampoline is called again, its selector is already fixed, and objc_msgSend_fixedup is the one that gets called.

In short, we could write objc_msgSend_fixup and objc_msgSend_fixedup like this:

id objc_msgSend_fixup(id obj, struct message_ref *ref, ...) {
    ref->sel = sel_registerName(ref->str);
    ref->trampoline = &objc_msgSend_fixedup;
    objc_msgSend_fixedup(obj, ref, ...);
}

id objc_msgSend_fixedup(id obj, struct message_ref *ref, ...) {
    objc_msgSend(obj, ref->sel, ...);
}

This makes message sending a lot faster, since the appropriate selector is discovered only at the first time the message is called (by objc_msgSend_fixup). On later calls, the selector will have been already found and the message is called directly with objc_msgSend (by objc_msgSend_fixedup).

In the question's assembly code, l_objc_msgSend_fixup_alloc is the alloc method's message_ref structure and the segmentation fault may have been caused by a problem in its first field (maybe it's not pointing to objc_msgSend_fixup...)



回答2:

Ok, your code is Objective-C, not C.

Edit / About objc_msgSend_fixup

objc_msgSend_fixup is internal Objective-C runtime stuff, used to manage calls using a C++ style method vtable.

You may read some articles about this here:

  • http://cocoasamurai.blogspot.ch/2010/01/understanding-objective-c-runtime.html
  • http://www.sealiesoftware.com/blog/

Edit / End

Now about your segfault.

Objective-C uses a runtime for message passing, allocations, etc.

Message passing (method call) is usually done by the objc_msgSend function.
That's what is used when you do:

[ someObject someFunction: someArg ];

It's translated to:

objc_msgSend( someObject, @selector( someFunction ), someArg );

So if you have a segfault in such a runtime function, such as objc_msgSend_fixup_alloc, it certainly means you calling a method on an uninitialized pointer (if not using ARC), or on a freed object.

Something like:

NSObject * o;

[ o retain ]; // Will segfault somewhere in the Obj-C runtime in non ARC, as 'o' may point to anything.

Or:

NSObject * o;

o = [ [ NSObject alloc ] init ];

[ o release ];
[ o retain ]; // Will segfault somewhere in the Obj-C runtime as 'o' is no longer a valid object address.

So even if the segfault location is in the runtime, this is certainly a basic Objective-C memory management issue, in your own code.

Try enabling NSZombie, it should help.
Also try the static analyzer.

Edit 2

It's crashing in the runtime, because the runtime needs to access the object's vtable to find the correct method to call.

As the object is invalid, the vtable lookup results in the dereference of an invalid pointer.

This is why the segfault is located here.

Edit 3

You say you're not linked with the objc library.
What do you call the «objc library»?

I'm asking this because, as we can see in your code, you are definitively using an Objective-C compiler.

You may not link with the «Foundation» framework, for instance, which provides the base objects, but since you're using an Objective-C compiler, the libobjc library (providing the runtime) will still be implicitly linked.

Are you sure it's not the case? Try a simple nm on your resulting binary.

Edit 4

If this is really the case, the objc_msgSend_fixup is not the first function to do in order to recreate the runtime.

As you define a class, the runtime needs to know about it, so you need to code stuff like objc_allocateClassPair and friends.

You'll also need to ensure the compiler won't use shortcuts.

I've seen in you're code stuff like: L_OBJC_CLASSLIST_REFERENCES_$_.

Does this symbol exist in your own version?