How to tell if a binary sequence is x86 machine co

2019-07-17 03:53发布

问题:

We all know that in x86 arch, the data and code is mixed in the memory or disk. But how to tell them?

The method is needed for paper, I wouldn't expect a 100% accuracy. 80%'s just ok, even some ideas would be fine:)

回答1:

Statistically determine which commands are common in executables.

Eg. some commands may be add/subtract etc.

For the unknown binary sequence, treat it like machine code, and look at the frequency of the various commands used (here you can probably assume commands start correctly at byte boundaries).

If an invalid command is used, obviously it is not machine code.

Otherwise, see whether the percentage frequency of commands used matches what would be usual.


Also, when a command is used which accepts addresses (eg. registers or memory/data locations), record them. Then check if the same locations are being accessed nearby.

This can be done by sorting any data locations used by frequency of usage descending, and seeing of the shape of the decreasing frequency somewhat matches what might be usual.


Data (non-machine code) is unlikely to match these statistical tests.

Do note that when I say fit, you can check for very loose fits. Even if it is quite a bit off what is normal, it probably still is code, unless there is almost no correlation statistically.



回答2:

See Is all data valid x86 16-bit machine code?.

  1. Put your data in a file
  2. Run ndisasm -m 32 > program.dump (use 16, 32 or 64 when applicable of course)
  3. Remove addresses and machine code in hex: cut -b29- < program.dump > program.dump2
  4. If you used 64-bits above, large instructions will break the line and we'll need to remove those empty lines now: grep -v '^$' < program.dump2 > program.asm
  5. (The file is now assemble-able)
  6. To determine if it consists of only instructions, run grep -l '^db' < program.asm > /dev/null; echo $?
  7. If you see 0, it is not all instructions (grep found something). If you don't, it is :)