-->

Reversed Mach-O 64-bit x86 Assembly analysis

2019-08-19 00:39发布

问题:

This question is for Intel x86 assembly experts to answer. Thanks for your effort in advance!

Problem Specification

I am analysing a binary file, which match Mach-O 64-bit x86 assembly. I am currently using MacOS 64 OS. The assembly comes from objdump.

The problem is that when I am learning assembly, I can see variable name "$xxx", I can see string value in ascii and I can also see the callee name like "call _printf"

But in this assembly, I can get nothing above:

  1. no main function:

    Disassembly of section __TEXT,__text:
    __text:
    100000c90:  55  pushq   %rbp
    100000c91:  48 89 e5    movq    %rsp, %rbp
    100000c94:  48 83 ec 10     subq    $16, %rsp
    100000c98:  48 8d 3d bf 02 00 00    leaq    703(%rip), %rdi
    100000c9f:  b0 00   movb    $0, %al
    100000ca1:  e8 68 02 00 00  callq   616
    100000ca6:  89 45 fc    movl    %eax, -4(%rbp)
    100000ca9:  48 83 c4 10     addq    $16, %rsp
    100000cad:  5d  popq    %rbp
    100000cae:  c3  retq
    100000caf:  90  nop
    100000cb0:  55  pushq   %rbp  
    ...
    

    The above is codes frame will be executed, but I have no idea where it is executed.

Also, I newbie of AT&T assemble. Hence, could you tell me what is the meaning of instruction:

    0000000100000c90    pushq   %rbp
    0000000100000c98    leaq    0x2bf(%rip), %rdi       ## literal pool for: "xxxx\n"
    ...
    0000000100000cd0    callq   0x100000c90

Is it a loop? I am not sure but it seems to be. And why we they use %rip and %rdi register. In intel x86 I know that EIP represents current caller address, but I don't understand the meaning here.

  1. call integer: No matter what call convention they used, I had never seen code pattern like "call 616":

    "100000cd0: e8 bb ff ff ff  callq   -69 <__mh_execute_header+C90>"
    
  2. After ret: Ret in intel x86, means delete stack frame and return control flow to caller. It should be an independent function. However, after this, we can see codes like

    100000cae:  c3  retq
    100000caf:  90  nop
    /* new function call */
    100000cb0:  55  pushq   %rbp
    ...
    

    It is ridiculous!

  3. ASCII string lost: I have already viewed the binary in Hexadecimal format, and recognise some ascii string before reverse it to asm file.

However, in this file no ascii string occurrences!

  1. Total architecture review:

    Disassembly of section __TEXT,__text:
    __text:
    from address 10000c90 to 100000ef6 of 145 lines
    
    Disassembly of section __TEXT,__stubs:
    __stubs:
    from address 100000efc to 100000f14 of 5 lines asm codes:
    100000efc:  ff 25 16 01 00 00   jmp qword ptr [rip + 278]
    100000f02:  ff 25 18 01 00 00   jmp qword ptr [rip + 280]
    100000f08:  ff 25 1a 01 00 00   jmp qword ptr [rip + 282]
    100000f0e:  ff 25 1c 01 00 00   jmp qword ptr [rip + 284]
    100000f14:  ff 25 1e 01 00 00   jmp qword ptr [rip + 286]
    
    Disassembly of section __TEXT,__stub_helper:
    __stub_helper:
    
    ...
    
    Disassembly of section __TEXT,__cstring:
    __cstring:
    
    ...
    
    Disassembly of section __TEXT,__unwind_info:
    __unwind_info:
    
    ...
    
    Disassembly of section __DATA,__nl_symbol_ptr:
    __nl_symbol_ptr:
    
    ...
    
    Disassembly of section __DATA,__got:
    __got:
    
    ...
    
    Disassembly of section __DATA,__la_symbol_ptr:
    __la_symbol_ptr:
    
    ...
    
    Disassembly of section __DATA,__data:
    __data:
    
    ...
    

Since it might be a virus, I cannot execute it. How should I analyse it ?

Update on May 21

I have already identified where is the output, and if I totally understand the data flow pipeline represented in this programme, I might be able to figure out the possible solutions.

I am appreciated if someone can give me the detailed explanation. Thank you !

Update on May 22

I installed a MacOS in VirtualBox and after chmod privileges , I executed the programme but nothing special except for two lines of output happened. And the result hiding in the binary file.

回答1:

  1. You don't need a main if you are not using C. The binary header contains the entry point address.
  2. Nothing special about call 616, it's just that you don't have (all) symbols. It's somewhat strange that objdump didn't calculate the address for you, but it should be 0x100000ca6+616.
  3. Not sure what you find ridiculous there. One function ends, another starts.
  4. That's not a question. Yes, you can create strings at runtime so you won't have them in the image. Possibly they are encrypted.