I want to be able to generate C code dynamically and re-load it quickly into my running C program.
I am on Linux, how could this be done?
Can a library .so file on Linux be re-compiled and reloaded at runtime?
Could it be compiled without producing a .so file, could the compiled output somehow go to memory and then be reloaded ? I want to reload the compiled code quickly.
Are you sure C is the right answer here? There are various interpreted languages such as Lua, Bigloo Scheme, or perhaps even Python that embed very well into an existing C application. You can write the dynamic parts using the extension language, which will support reloading code at runtime.
The obvious disadvantage is performance - if you absolutely need the raw speed of compiled C then these may be a no-go.
What you want to do is reasonable, and I am doing exactly that in MELT (a high level domain specific language to extend GCC; MELT is compiled to C, thru a translator itself written in MELT).
First, when generating C code (or many other source languages), a good advice is to keep some sort of abstract syntax tree (AST) in memory. So build first the entire AST of the generated C code, then emit it as C syntax. Don't think of your code generation framework without an explicit AST (in other words, generation of C code with a bunch of printf is a maintenance nightmare, you want to have some intermediate representation).
Second, the main reason to generate C code is to take advantage of a good optimizing compiler (another reason is the portability and ubiquity of C). If you don't care about performance of the generated code (and TCC compiles very quickly C into a very naive and slow machine code) you could use some other approaches, e.g. using some JIT libraries like Gnu lightning (very quick generation of slow machine code), Gnu Libjit or ASMJIT (generated machine code is a bit better), LLVM or GCCJIT (good machine code generated, but generation time comparable to a compiler).
So if you generate C code and want it to run quickly, the compilation time of the C code is not negligible (since you probably would fork a
gcc -O -fPIC -shared
command to make some shared objectfoo.so
out of your generatedfoo.c
). By experience, generating C code takes much less time than compiling it (withgcc -O
). In MELT, the generation of C code is more than 10x faster than its compilation by GCC (and usually 30x faster). But the optimizations done by a C compiler are worth it.Once you emitted your C code, forked its compilation into a
.so
shared object, you candlopen
it. Don't be shy, my manydl.c example demonstrates that on Linux you can dlopen a big lot of shared objects (many hundreds of thousands). The real bottleneck is the compilation of the generated C code. In practice, you don't really need todlclose
on Linux (unless you are coding a server program needing to run for months); an unused shared module can stay practicallydlopen
-ed and you mostly are leaking process address space (which is a cheap resource), since most of that unused.so
would be swapped-out.dlopen
is done quickly, what takes time is the compilation of a C source, because you really want the optimization to be done by the C compiler.You coul use many other different approaches, e.g. have a bytecode interpreter and generate for that bytecode, use Common Lisp (e.g. SBCL on Linux which compiles dynamically to machine code), LuaJit, Java, MetaOcaml etc.
As others suggested, you don't care much about the time to write a C file, and it will stay in filesystem cache in practice (see also this). And writing it is much faster than compiling it, so staying in memory is not worth the trouble. Use some tmpfs if you are concerned by I/O times.
addenda
You asked
Of course yes: you should fork a command to build the library from the generated C code (e.g. a
gcc -O -fPIC -shared generated.c -o generated.so
, but you could do it indirectly e.g. by running amake -j
, especially if thegenerated.so
is big enough to make it relevant to split thegenerated.c
in several C generated files!) and then you dynamically load your library with dlopen (giving a full path like/some/file/path/to/generated.so
, and probably theRTLD_NOW
flag, to it) and you have to usedlsym
to find relevant symbols inside. Don't think of re-loading (a second time) the samegenerated.so
, better to emit a uniquegenerated1.c
(thengenerated2.c
etc...) C file, then to compile it to a uniquegenerated1.so
(the second time togenerated2.so
, etc...) then todlopen
it (and this can be done many hundred thousands of times). You may want to have, in the emittedgenerated*.c
files, some constructor functions which would be executed atdlopen
time of thegenerated*.so
Your base application program should have defined a convention about the set of dlsym-ed names (usually functions) and how they are called. It should only directly call functions in your
generated*.so
thrudlsym
-ed function pointers. In practice you would decide for example that eachgenerated*.c
defines a functionvoid dynfoo(int)
andint dynbar(int,int)
and usedlsym
with"dynfoo"
and"dynbar"
and call these thru function pointers (returned bydlsym
). You should also define conventions of how and when thesedynfoo
anddynbar
would be called. You'll better link your base application with-rdynamic
so that yourgenerated*.c
files could call your application functions.You don't want your
generated*.so
to re-define existing names. For instance, you don't want to redefinemalloc
in yourgenerated*.c
and expect all heap allocation functions to magically use your new variant (that probably won't work, and if even if it did, it would be dangerous).You probably won't bother to
dlclose
a dynamically loaded shared object, except at application clean-up and exit time (but I don't bother at all todlclose
). If you dodlclose
some dynamically loadedgenerated*.so
file, be sure that nothing is used in it: no pointers, not even return addresses in call frames, are existing to it.P.S. the MELT translator is currently 57KLOC of MELT code translated to nearly 1770KLOC of C code.
Your best bet's probably the TCC compiler, which allows you to do exactly this --- compile source code, add it to your program, run it, all without touching files.
For a more robust but non-C-based solution, you should probably check out the LLVM project, which does much the same thing but from the perspective of producing JITs. You don't get to go via C, instead using a kind of abstract portable machine code, but the generated code is loads faster and it's under more active development.
OTOH if you want to do it all manually by shelling out to gcc, compiling a
.so
and then loading it yourself,dlopen()
anddlclose()
will do what you want.If you want to reload a library dynamically, you can use
dlopen
function (see mans). It opens a library .so file and returns a void* pointer to it, then you can get a pointer to any function/variable of your library withdlsym
.To compile your libraries in-memory, well, the best thing I think you can do is creating memory filesystem as described here.