How can I understand a .pyc file content

2019-01-18 07:10发布

问题:

I have a .pyc file. I need to understand the content of that file to know how the disassembler works of python, i.e. how can I generate a output like dis.dis(function) from .pyc file content.

for e.g.

>>> def sqr(x):  
...     return x*x
...
>>> import dis
>>> dis.dis(sqr)
  2           0 LOAD_FAST                0 (x)
              3 LOAD_FAST                0 (x)
              6 BINARY_MULTIPLY     
              7 RETURN_VALUE        

I need to get a output like this using the .pyc file.

回答1:

.pyc files contain some metadata and a marshaled code object; to load the code object and disassemble that use:

import dis, marshal, sys

# Header size changed in 3.3. It might change again, but as of this writing, it hasn't.
header_size = 12 if sys.version_info >= (3, 3) else 8

with open(pycfile, "rb") as f:
    magic_and_timestamp = f.read(header_size)  # first 8 or 12 bytes are metadata
    code = marshal.load(f)                     # rest is a marshalled code object

dis.dis(code)

Demo with the bisect module:

>>> import bisect
>>> import dis, marshal
>>> import sys
>>> header_size = 12 if sys.version_info >= (3, 3) else 8
>>> with open(bisect.__file__, "rb") as f:
...     magic_and_timestamp = f.read(header_size)  # first 8 or 12 bytes are metadata
...     code = marshal.load(f)                     # rest is bytecode
... 
>>> dis.dis(code)
  1           0 LOAD_CONST               0 ('Bisection algorithms.')
              3 STORE_NAME               0 (__doc__)

  3           6 LOAD_CONST               1 (0)
              9 LOAD_CONST               8 (None)
             12 LOAD_CONST               2 (<code object insort_right at 0x106a459b0, file "/Users/mpieters/Development/Library/buildout.python/parts/opt/lib/python2.7/bisect.py", line 3>)
             15 MAKE_FUNCTION            2
             18 STORE_NAME               2 (insort_right)

 22          21 LOAD_NAME                2 (insort_right)
             24 STORE_NAME               3 (insort)

 24          27 LOAD_CONST               1 (0)
             30 LOAD_CONST               8 (None)
             33 LOAD_CONST               3 (<code object bisect_right at 0x106a45ab0, file "/Users/mpieters/Development/Library/buildout.python/parts/opt/lib/python2.7/bisect.py", line 24>)
             36 MAKE_FUNCTION            2
             39 STORE_NAME               4 (bisect_right)

 45          42 LOAD_NAME                4 (bisect_right)
             45 STORE_NAME               5 (bisect)

 47          48 LOAD_CONST               1 (0)
             51 LOAD_CONST               8 (None)
             54 LOAD_CONST               4 (<code object insort_left at 0x106a45bb0, file "/Users/mpieters/Development/Library/buildout.python/parts/opt/lib/python2.7/bisect.py", line 47>)
             57 MAKE_FUNCTION            2
             60 STORE_NAME               6 (insort_left)

 67          63 LOAD_CONST               1 (0)
             66 LOAD_CONST               8 (None)
             69 LOAD_CONST               5 (<code object bisect_left at 0x106a45cb0, file "/Users/mpieters/Development/Library/buildout.python/parts/opt/lib/python2.7/bisect.py", line 67>)
             72 MAKE_FUNCTION            2
             75 STORE_NAME               7 (bisect_left)

 89          78 SETUP_EXCEPT            14 (to 95)

 90          81 LOAD_CONST               6 (-1)
             84 LOAD_CONST               7 (('*',))
             87 IMPORT_NAME              8 (_bisect)
             90 IMPORT_STAR         
             91 POP_BLOCK           
             92 JUMP_FORWARD            17 (to 112)

 91     >>   95 DUP_TOP             
             96 LOAD_NAME                9 (ImportError)
             99 COMPARE_OP              10 (exception match)
            102 POP_JUMP_IF_FALSE      111
            105 POP_TOP             
            106 POP_TOP             
            107 POP_TOP             

 92         108 JUMP_FORWARD             1 (to 112)
        >>  111 END_FINALLY         
        >>  112 LOAD_CONST               8 (None)
            115 RETURN_VALUE        

Note that this is just the top level code object, defining the module. If you wanted to analyse the functions contained, you'll need to load the nested code objects, from the top-level code.co_consts array; for example, the insort_right function's code object is loaded with LOAD_CONST 2, so look for the code object at that index:

>>> code.co_consts[2]
<code object insort_right at 0x106a459b0, file "/Users/mpieters/Development/Library/buildout.python/parts/opt/lib/python2.7/bisect.py", line 3>
>>> dis.dis(code.co_consts[2])
 12           0 LOAD_FAST                2 (lo)
              3 LOAD_CONST               1 (0)
              6 COMPARE_OP               0 (<)
              9 POP_JUMP_IF_FALSE       27

 13          12 LOAD_GLOBAL              0 (ValueError)
             15 LOAD_CONST               2 ('lo must be non-negative')
             18 CALL_FUNCTION            1
             21 RAISE_VARARGS            1
             24 JUMP_FORWARD             0 (to 27)

 14     >>   27 LOAD_FAST                3 (hi)
             30 LOAD_CONST               5 (None)
             33 COMPARE_OP               8 (is)
             36 POP_JUMP_IF_FALSE       54

 15          39 LOAD_GLOBAL              2 (len)
             42 LOAD_FAST                0 (a)
             45 CALL_FUNCTION            1
             48 STORE_FAST               3 (hi)
             51 JUMP_FORWARD             0 (to 54)

 16     >>   54 SETUP_LOOP              65 (to 122)
        >>   57 LOAD_FAST                2 (lo)
             60 LOAD_FAST                3 (hi)
             63 COMPARE_OP               0 (<)
             66 POP_JUMP_IF_FALSE      121

 17          69 LOAD_FAST                2 (lo)
             72 LOAD_FAST                3 (hi)
             75 BINARY_ADD          
             76 LOAD_CONST               3 (2)
             79 BINARY_FLOOR_DIVIDE 
             80 STORE_FAST               4 (mid)

 18          83 LOAD_FAST                1 (x)
             86 LOAD_FAST                0 (a)
             89 LOAD_FAST                4 (mid)
             92 BINARY_SUBSCR       
             93 COMPARE_OP               0 (<)
             96 POP_JUMP_IF_FALSE      108
             99 LOAD_FAST                4 (mid)
            102 STORE_FAST               3 (hi)
            105 JUMP_ABSOLUTE           57

 19     >>  108 LOAD_FAST                4 (mid)
            111 LOAD_CONST               4 (1)
            114 BINARY_ADD          
            115 STORE_FAST               2 (lo)
            118 JUMP_ABSOLUTE           57
        >>  121 POP_BLOCK           

 20     >>  122 LOAD_FAST                0 (a)
            125 LOAD_ATTR                3 (insert)
            128 LOAD_FAST                2 (lo)
            131 LOAD_FAST                1 (x)
            134 CALL_FUNCTION            2
            137 POP_TOP             
            138 LOAD_CONST               5 (None)
            141 RETURN_VALUE        

I personally would avoid trying to parse the .pyc file with anything other than the matching Python version and marshal module. The marshal format is basically an internal serialisation format that changes with the needs of Python itself. New features like list comprehensions and with statements and async/await require new additions to the format, which is not published other than as C source code.

If you do go this route, and manage to read a code object by other means than using the module, you'll have to parse out the disassembly from the various attributes of the code object; see the dis module source for details on how to do this (you'll have to use the co_firstlineno and co_lnotab attributes to create a bytecode-offset-to-linenumber map, for example).