Say I have the following trivial C header file:
// foo1.h
typedef int foo;
typedef struct {
foo a;
char const* b;
} bar;
bar baz(foo*, bar*, ...);
My goal is to take this file, and produce an LLVM module that looks something like this:
%struct.bar = type { i32, i8* }
declare { i32, i8* } @baz(i32*, %struct.bar*, ...)
In other words, convert a C .h
file with declarations into the equivalent LLVM IR, including type resolution, macro expansion, and so on.
Passing this through Clang to generate LLVM IR produces an empty module (as none of the definitions are actually used):
$ clang -cc1 -S -emit-llvm foo1.h -o -
; ModuleID = 'foo1.h'
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-darwin13.3.0"
!llvm.ident = !{!0}
!0 = metadata !{metadata !"clang version 3.5 (trunk 200156) (llvm/trunk 200155)"}
My first instinct was to turn to Google, and I came across two related questions: one from a mailing list, and one from StackOverflow. Both suggested using the -femit-all-decls
flag, so I tried that:
$ clang -cc1 -femit-all-decls -S -emit-llvm foo1.h -o -
; ModuleID = 'foo1.h'
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-darwin13.3.0"
!llvm.ident = !{!0}
!0 = metadata !{metadata !"clang version 3.5 (trunk 200156) (llvm/trunk 200155)"}
Same result.
I've also tried disabling optimizations (both with -O0
and -disable-llvm-optzns
), but that made no difference for the output. Using the following variation did produce the desired IR:
// foo2.h
typedef int foo;
typedef struct {
foo a;
char const* b;
} bar;
bar baz(foo*, bar*, ...);
void doThings() {
foo a = 0;
bar myBar;
baz(&a, &myBar);
}
Then running:
$ clang -cc1 -S -emit-llvm foo2.h -o -
; ModuleID = 'foo2.h'
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-darwin13.3.0"
%struct.bar = type { i32, i8* }
; Function Attrs: nounwind
define void @doThings() #0 {
entry:
%a = alloca i32, align 4
%myBar = alloca %struct.bar, align 8
%coerce = alloca %struct.bar, align 8
store i32 0, i32* %a, align 4
%call = call { i32, i8* } (i32*, %struct.bar*, ...)* @baz(i32* %a, %struct.bar* %myBar)
%0 = bitcast %struct.bar* %coerce to { i32, i8* }*
%1 = getelementptr { i32, i8* }* %0, i32 0, i32 0
%2 = extractvalue { i32, i8* } %call, 0
store i32 %2, i32* %1, align 1
%3 = getelementptr { i32, i8* }* %0, i32 0, i32 1
%4 = extractvalue { i32, i8* } %call, 1
store i8* %4, i8** %3, align 1
ret void
}
declare { i32, i8* } @baz(i32*, %struct.bar*, ...) #1
attributes #0 = { nounwind "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-realign-stack" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-realign-stack" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
!llvm.ident = !{!0}
!0 = metadata !{metadata !"clang version 3.5 (trunk 200156) (llvm/trunk 200155)"}
Besides the placeholder doThings
, this is exactly what I want the output to look like! The problem is that this requires 1.) using a modified version of the header, and 2.) knowing the types of things in advance. Which leads me to...
Why?
Basically, I'm building an implementation for a language using LLVM to generate code. The implementation should support C interop by specifying C header files and associated libs only (no manual declarations), which will then be used by the compiler before link-time to ensure that function invocations match their signatures. Hence, I've narrowed the problem down to 2 possible solutions:
- Turn the header files into LLVM IR/bitcode, which can then get the type signature of each function
- Use
libclang
to parse the headers, then query the types from the resulting AST (my 'last resort' in case there is no sufficient answer for this question)
TL;DR
I need to take a C header file (such as the above foo1.h
) and, without changing it, generate the aforementioned expected LLVM IR using Clang, OR, find another way to get function signatures from C header files (preferrably using libclang
or building a C parser)