While going through julia, I wanted to have a functionality similar to python's dis
module.
Going through over the net, I found out that the Julia community have worked over this issue and given these (https://github.com/JuliaLang/julia/issues/218)
finfer -> code_typed
methods(function, types) -> code_lowered
disassemble(function, types, true) -> code_native
disassemble(function, types, false) -> code_llvm
I have tried these personally using the Julia REPL, but I quite seem to find it hard to understand.
In Python, I can disassemble a function like this.
>>> import dis
>>> dis.dis(lambda x: 2*x)
1 0 LOAD_CONST 1 (2)
3 LOAD_FAST 0 (x)
6 BINARY_MULTIPLY
7 RETURN_VALUE
>>>
Can anyone who has worked with these help me understand them more? Thanks.
The standard CPython implementation of Python parses source code and does some pre-processing and simplification of it – aka "lowering" – transforming it to a machine-friendly, easy-to-interpret format called "bytecode". This is what is displayed when you "disassemble" a Python function. This code is not executable by the hardware – it is "executable" by the CPython interpreter. CPython's bytecode format is fairly simple, partly because that's what interpreters tend to do well with – if the bytecode is too complex, it slows down the interpreter – and partly because the Python community tends to put a high premium on simplicity, sometimes at the cost of high performance.
Julia's implementation is not interpreted, it is just-in-time (JIT) compiled. This means that when you call a function, it is transformed to machine code which is executed directly by the native hardware. This process is quite a bit more complex than the parsing and lowering to bytecode that Python does, but in exchange for that complexity, Julia gets its hallmark speed. (The PyPy JIT for Python is also much more complex than CPython but also typically much faster – increased complexity is a fairly typical cost for speed.) The four levels of "disassembly" for Julia code give you access to the representation of a Julia method implementation for particular argument types at different stages of the transformation from source code to machine code. I'll use the following function which computes the next Fibonacci number after its argument as an example:
Lowered code. The
@code_lowered
macro displays code in a format that is the closest to Python byte code, but rather than being intended for execution by an interpreter, it's intended for further transformation by a compiler. This format is largely internal and not intended for human consumption. The code is transformed into "single static assignment" form in which "each variable is assigned exactly once, and every variable is defined before it is used". Loops and conditionals are transformed into gotos and labels using a singleunless
/goto
construct (this is not exposed in user-level Julia). Here's our example code in lowered form (in Julia 0.6.0-pre.beta.134, which is just what I happen to have available):You can see the
SSAValue
nodes andunless
/goto
constructs and label numbers. This is not that hard to read, but again, it's also not really meant to be easy for human consumption. Lowered code doesn't depend on the types of the arguments, except in as far as they determine which method body to call – as long as the same method is called, the same lowered code applies.Typed code. The
@code_typed
macro presents a method implementation for a particular set of argument types after type inference and inlining. This incarnation of the code is similar to the lowered form, but with expressions annotated with type information and some generic function calls replaced with their implementations. For example, here is the type code for our example function:Calls to
one(n)
have been replaced with the literalInt64
value1
(on my system the default integer type isInt64
). The expressionb < n
has been replaced with its implementation in terms of theslt_int
intrinsic ("signed integer less than") and the result of this has been annotated with return typeBool
. The expressiona + b
has been also replaced with its implementation in terms of theadd_int
intrinsic and its result type annotated asInt64
. And the return type of the entire function body has been annotated asInt64
.Unlike lowered code, which depends only on argument types to determine which method body is called, the details of typed code depend on argument types:
This is the typed version of the
nextfib
function for anInt128
argument. The literal1
must be sign extended toInt128
and the result types of operations are of typeInt128
instead ofInt64
. The typed code can be quite different if the implementation of a type is considerably different. For examplenextfib
forBigInts
is significantly more involved than for simple "bits types" likeInt64
andInt128
:This reflects the fact that operations on
BigInts
are pretty complicated and involve memory allocation and calls to the external GMP library (libgmp
).LLVM IR. Julia uses the LLVM compiler framework to generate machine code. LLVM defines an assembly-like language which it uses as a shared intermediate representation (IR) between different compiler optimization passes and other tools in the framework. There are three isomorphic forms of LLVM IR:
Julia uses LLVM's C++ API to construct LLVM IR in memory (form 3) and then call some LLVM optimization passes on that form. When you do
@code_llvm
you see the LLVM IR after generation and some high-level optimizations. Here's LLVM code for our ongoing example:This is the textual form of the in-memory LLVM IR for the
nextfib(123)
method implementation. LLVM is not easy to read – it's not intended to be written or read by people most of the time – but it is thoroughly specified and documented. Once you get the hang of it, it's not hard to understand. This code jumps to the labelL4
and initializes the "registers"%storemerge1
and%storemerge
with thei64
(LLVM's name forInt64
) value1
(their values are derived differently when jumped to from different locations – that's what thephi
instruction does). It then does aicmp slt
comparing%storemerge
with register%0
– which holds the argument untouched for the entire method execution – and saves the comparison result into the register%1
. It does anadd i64
on%storemerge
and%storemerge1
and saves the result into register%2
. If%1
is true, it branches back toL4
and otherwise it branches toL13
. When the code loops back toL4
the register%storemerge1
gets the previous values of%storemerge
and%storemerge
gets the previous value of%2
.Native code. Since Julia executes native code, the last form a method implementation takes is what the machine actually executes. This is just binary code in memory, which is rather hard to read, so long ago people invented various forms of "assembly language" which represent instructions and registers with names and have some amount of simple syntax to help express what instructions do. In general, assembly language remains close to one-to-one correspondence with machine code, in particular, one can always "disassemble" machine code into assembly code. Here's our example:
This is on an Intel Core i7, which is in the x86_64 CPU family. It only uses standard integer instructions, so it doesn't matter beyond that what the architecture is, but you can get different results for some code depending on the specific architecture of your machine, since JIT code can be different on different systems. The
pushq
andmovq
instructions at the beginning are a standard function preamble, saving registers to the stack; similarly,popq
restores the registers andretq
returns from the function;nopw
is a 2-byte instruction that does nothing, included just to pad the length of the function. So the meat of the code is just this:The
movl
instructions at the top initialize registers with 1 values. Themovq
instructions move values between registers and theaddq
instruction adds registers. Thecmpq
instruction compares two registers andjl
either jumps back toL16
or continues to return from the function. This handful of integer machine instructions in a tight loop is exactly what executes when your Julia function call runs, presented in slightly more pleasant human-readable form. It's easy to see why it runs fast.If you're interested in JIT compilation in general as compared to interpreted implementations, Eli Bendersky has a great pair of blog posts where he goes from a simple interpreter implementation of a language to a (simple) optimizing JIT for the same language: