I have a .pyc
file. I need to understand the content of that file to know how the disassembler works of python, i.e. how can I generate a output like dis.dis(function)
from .pyc
file content.
for e.g.
>>> def sqr(x):
... return x*x
...
>>> import dis
>>> dis.dis(sqr)
2 0 LOAD_FAST 0 (x)
3 LOAD_FAST 0 (x)
6 BINARY_MULTIPLY
7 RETURN_VALUE
I need to get a output like this using the .pyc
file.
.pyc
files contain some metadata and a marshal
ed code
object; to load the code
object and disassemble that use:
import dis, marshal, sys
# Header size changed in 3.3. It might change again, but as of this writing, it hasn't.
header_size = 12 if sys.version_info >= (3, 3) else 8
with open(pycfile, "rb") as f:
magic_and_timestamp = f.read(header_size) # first 8 or 12 bytes are metadata
code = marshal.load(f) # rest is a marshalled code object
dis.dis(code)
Demo with the bisect
module:
>>> import bisect
>>> import dis, marshal
>>> import sys
>>> header_size = 12 if sys.version_info >= (3, 3) else 8
>>> with open(bisect.__file__, "rb") as f:
... magic_and_timestamp = f.read(header_size) # first 8 or 12 bytes are metadata
... code = marshal.load(f) # rest is bytecode
...
>>> dis.dis(code)
1 0 LOAD_CONST 0 ('Bisection algorithms.')
3 STORE_NAME 0 (__doc__)
3 6 LOAD_CONST 1 (0)
9 LOAD_CONST 8 (None)
12 LOAD_CONST 2 (<code object insort_right at 0x106a459b0, file "/Users/mpieters/Development/Library/buildout.python/parts/opt/lib/python2.7/bisect.py", line 3>)
15 MAKE_FUNCTION 2
18 STORE_NAME 2 (insort_right)
22 21 LOAD_NAME 2 (insort_right)
24 STORE_NAME 3 (insort)
24 27 LOAD_CONST 1 (0)
30 LOAD_CONST 8 (None)
33 LOAD_CONST 3 (<code object bisect_right at 0x106a45ab0, file "/Users/mpieters/Development/Library/buildout.python/parts/opt/lib/python2.7/bisect.py", line 24>)
36 MAKE_FUNCTION 2
39 STORE_NAME 4 (bisect_right)
45 42 LOAD_NAME 4 (bisect_right)
45 STORE_NAME 5 (bisect)
47 48 LOAD_CONST 1 (0)
51 LOAD_CONST 8 (None)
54 LOAD_CONST 4 (<code object insort_left at 0x106a45bb0, file "/Users/mpieters/Development/Library/buildout.python/parts/opt/lib/python2.7/bisect.py", line 47>)
57 MAKE_FUNCTION 2
60 STORE_NAME 6 (insort_left)
67 63 LOAD_CONST 1 (0)
66 LOAD_CONST 8 (None)
69 LOAD_CONST 5 (<code object bisect_left at 0x106a45cb0, file "/Users/mpieters/Development/Library/buildout.python/parts/opt/lib/python2.7/bisect.py", line 67>)
72 MAKE_FUNCTION 2
75 STORE_NAME 7 (bisect_left)
89 78 SETUP_EXCEPT 14 (to 95)
90 81 LOAD_CONST 6 (-1)
84 LOAD_CONST 7 (('*',))
87 IMPORT_NAME 8 (_bisect)
90 IMPORT_STAR
91 POP_BLOCK
92 JUMP_FORWARD 17 (to 112)
91 >> 95 DUP_TOP
96 LOAD_NAME 9 (ImportError)
99 COMPARE_OP 10 (exception match)
102 POP_JUMP_IF_FALSE 111
105 POP_TOP
106 POP_TOP
107 POP_TOP
92 108 JUMP_FORWARD 1 (to 112)
>> 111 END_FINALLY
>> 112 LOAD_CONST 8 (None)
115 RETURN_VALUE
Note that this is just the top level code object, defining the module. If you wanted to analyse the functions contained, you'll need to load the nested code
objects, from the top-level code.co_consts
array; for example, the insort_right
function's code object is loaded with LOAD_CONST 2
, so look for the code object at that index:
>>> code.co_consts[2]
<code object insort_right at 0x106a459b0, file "/Users/mpieters/Development/Library/buildout.python/parts/opt/lib/python2.7/bisect.py", line 3>
>>> dis.dis(code.co_consts[2])
12 0 LOAD_FAST 2 (lo)
3 LOAD_CONST 1 (0)
6 COMPARE_OP 0 (<)
9 POP_JUMP_IF_FALSE 27
13 12 LOAD_GLOBAL 0 (ValueError)
15 LOAD_CONST 2 ('lo must be non-negative')
18 CALL_FUNCTION 1
21 RAISE_VARARGS 1
24 JUMP_FORWARD 0 (to 27)
14 >> 27 LOAD_FAST 3 (hi)
30 LOAD_CONST 5 (None)
33 COMPARE_OP 8 (is)
36 POP_JUMP_IF_FALSE 54
15 39 LOAD_GLOBAL 2 (len)
42 LOAD_FAST 0 (a)
45 CALL_FUNCTION 1
48 STORE_FAST 3 (hi)
51 JUMP_FORWARD 0 (to 54)
16 >> 54 SETUP_LOOP 65 (to 122)
>> 57 LOAD_FAST 2 (lo)
60 LOAD_FAST 3 (hi)
63 COMPARE_OP 0 (<)
66 POP_JUMP_IF_FALSE 121
17 69 LOAD_FAST 2 (lo)
72 LOAD_FAST 3 (hi)
75 BINARY_ADD
76 LOAD_CONST 3 (2)
79 BINARY_FLOOR_DIVIDE
80 STORE_FAST 4 (mid)
18 83 LOAD_FAST 1 (x)
86 LOAD_FAST 0 (a)
89 LOAD_FAST 4 (mid)
92 BINARY_SUBSCR
93 COMPARE_OP 0 (<)
96 POP_JUMP_IF_FALSE 108
99 LOAD_FAST 4 (mid)
102 STORE_FAST 3 (hi)
105 JUMP_ABSOLUTE 57
19 >> 108 LOAD_FAST 4 (mid)
111 LOAD_CONST 4 (1)
114 BINARY_ADD
115 STORE_FAST 2 (lo)
118 JUMP_ABSOLUTE 57
>> 121 POP_BLOCK
20 >> 122 LOAD_FAST 0 (a)
125 LOAD_ATTR 3 (insert)
128 LOAD_FAST 2 (lo)
131 LOAD_FAST 1 (x)
134 CALL_FUNCTION 2
137 POP_TOP
138 LOAD_CONST 5 (None)
141 RETURN_VALUE
I personally would avoid trying to parse the .pyc
file with anything other than the matching Python version and marshal
module. The marshal
format is basically an internal serialisation format that changes with the needs of Python itself. New features like list comprehensions and with
statements and async
/await
require new additions to the format, which is not published other than as C source code.
If you do go this route, and manage to read a code
object by other means than using the module, you'll have to parse out the disassembly from the various attributes of the code object; see the dis
module source for details on how to do this (you'll have to use the co_firstlineno
and co_lnotab
attributes to create a bytecode-offset-to-linenumber map, for example).