Getting the original variable name for an LLVM Val

2019-01-16 20:46发布

问题:

The operands for an llvm::User (e.g. instruction) are llvm::Values.

After the mem2reg pass, variables are in SSA form, and their names as corresponding to the original source code are lost. Value::getName() is only set for some things; for most variables, which are intermediaries, its not set.

The instnamer pass can be run to give all the variables names like tmp1 and tmp2, but this doesn't capture where they originally come from. Here's some LLVM IR beside the original C code:

I am building a simple html page to visualise and debug some optimisations I am working on, and I want to show the SSA variables as namever notation, rather than just temporary instnamer names. Its just to aid my readability.

I am getting my LLVM IR from clang with a commandline such as:

 clang -g3 -O1 -emit-llvm -o test.bc -c test.c

There are calls to llvm.dbg.declare and llvm.dbg.value in the IR; how do you turn into the original sourcecode names and SSA version numbers?

So how can I determine the original variable (or named constant name) from an llvm::Value? Debuggers must be able to do this, so how can I?

回答1:

This is part of the debug information that's attached to LLVM IR in the form of metadata. Documentation is here. An old blog post with some background is also available.


$ cat  > z.c
long fact(long arg, long farg, long bart)
{
    long foo = farg + bart;
    return foo * arg;
}

$ clang -emit-llvm -O3 -g -c z.c
$ llvm-dis z.bc -o -

Produces this:

define i64 @fact(i64 %arg, i64 %farg, i64 %bart) #0 {
entry:
  tail call void @llvm.dbg.value(metadata !{i64 %arg}, i64 0, metadata !10), !dbg !17
  tail call void @llvm.dbg.value(metadata !{i64 %farg}, i64 0, metadata !11), !dbg !17
  tail call void @llvm.dbg.value(metadata !{i64 %bart}, i64 0, metadata !12), !dbg !17
  %add = add nsw i64 %bart, %farg, !dbg !18
  tail call void @llvm.dbg.value(metadata !{i64 %add}, i64 0, metadata !13), !dbg !18
  %mul = mul nsw i64 %add, %arg, !dbg !19
  ret i64 %mul, !dbg !19
}

With -O0 instead of -O3, you won't see llvm.dbg.value, but you will see llvm.dbg.declare.



回答2:

Given a Value, getting variable name from it can be done by traversing all the llvm.dbg.declare and llvm.dbg.value calls in the enclosing function, checking if any refers to that value, and if so, return the DIVariable associated with the value by that intrinsic call.

So, the code should look something like (roughly, not tested or even compiled):

const Function* findEnclosingFunc(const Value* V) {
  if (const Argument* Arg = dyn_cast<Argument>(V)) {
    return Arg->getParent();
  }
  if (const Instruction* I = dyn_cast<Instruction>(V)) {
    return I->getParent()->getParent();
  }
  return NULL;
}

const MDNode* findVar(const Value* V, const Function* F) {
  for (const_inst_iterator Iter = inst_begin(F), End = inst_end(F); Iter != End; ++Iter) {
    const Instruction* I = &*Iter;
    if (const DbgDeclareInst* DbgDeclare = dyn_cast<DbgDeclareInst>(I)) {
      if (DbgDeclare->getAddress() == V) return DbgDeclare->getVariable();
    } else if (const DbgValueInst* DbgValue = dyn_cast<DbgValueInst>(I)) {
      if (DbgValue->getValue() == V) return DbgValue->getVariable();
    }
  }
  return NULL;
}

StringRef getOriginalName(const Value* V) {
  // TODO handle globals as well

  const Function* F = findEnclosingFunc(V);
  if (!F) return V->getName();

  const MDNode* Var = findVar(V, F);
  if (!Var) return "tmp";

  return DIVariable(Var).getName();
}

You can see above I was too lazy to add handling of globals, but it's not that big a deal actually - this requires iterating over all the globals listed under the current compile unit debug info (use M.getNamedMetadata("llvm.dbg.cu") to get a list of all the compile units in the current module), then checking which matches your variable (via the getGlobal method) and returning its name.

However, keep in mind the above will only work for values directly associated with original variables. Any value that is a result of any computation will not be properly named this way; and in particular, values that represent field accesses will not be named with the field name. This is doable but requires more involved processing - you'll have to identify the field number from the GEP, then dig into the type debug information for the struct to get back the field name. Debuggers do that, yes, but no debugger operates in LLVM IR land - as far as I know even LLVM's own LLDB works differently, by parsing the DWARF in the object file into Clang types.



回答3:

I had a similar requirement, converting the IR into "SSA variables as VarNamever notation". The following documentation and links helped me. 1) https://releases.llvm.org/3.4.2/docs/tutorial/LangImpl7.html 2) LLVM opt mem2reg has no effect

Hope this helps the community!!!