Obfuscating python bytecode through interpreter mu

2019-02-03 18:43发布

问题:

Actually, Dropbox made it very well, they were able to secure their desktop application made in python; I researched this a lot, but no good solution better than obfuscation, which is not very secure way to go, and you will end up seeing your code uploaded somewhere.

I listened to a session made by Giovanni Bajo (the PyInstaller founder), he said Dropbox does this:

  1. Bytecode-scrambling by recompiling your CPython's interpreter, and by this, standard CPython interpreter will not be able to run it, only the recompiled cpython interpreter.
  2. All what you need to do is to shuffle the numbers below the define loadup 8.

I've never gone through Python's source code, so, I will not claim that I fully understand the above words.

I need to hear the voice of experts: How to do such a thing? And if after recompilation I will be able to package my application using the available tools like PyInstaller?

Update:

I made some research regarding how Dropbox does this type of obfuscation/mutation, and I found this:

According to Hagen Fritsch, they do it in two stages:

  1. They use TEA cipher along with an RNG seeded by some values in the code object of each python module. They adjusted the interpreter accordingly so that it

    a) Decrypts the modules and

    b) Prevents access to the decrypted code-objects.

    This would have been the straightforward path just letting dropbox decrypt everything and dump the modules using the builtin marshaller.

  2. Another trick used is the manual scrambling of the opcodes. Unfortunately this could only be fixed semiautomatically thus their monoalphabetic substitution cipher proved quite effective in terms of winning some time.

I still want more insights on how this could be done, more over, I don't know how the decryption happens in this process... I want all the experts' voice here ... common guys where are you.

回答1:

I suppose this is about shuffling the numbers in include/opcode.h. I don't see a #define loadup there, though, but maybe that refers to some old Python version. I have not tried this.

This will obfuscate your .pyc files so that they cannot be inspected by any tools that recognize normal .pyc files. This may help you hide some security measures inside your program. However, an attacker might be able (for example) to extract your custom Python interpreter from your app bundle and leverage that to inspect the files. (Just launch the interactive interpreter and start investigation by importing and using dir on a module)

Note also that your package will surely contain some modules from the Python standard library. If an attacker guesses that you have shuffled the opcodes, he could do a byte-for-byte comparison between your version and the normal version of a standard module and discover your opcodes that way. To prevent this simple attack, one can protect the modules with proper encryption and try to hide the decryption step in the interpreter, as mentioned in the updated question. This forces the attacker to use machine code debugging to look for the decryption code.


I don't know how the decryption happens in this process...

You would modify the part of the interpreter that imports modules and insert your decryption C code there.