Find loops in LLVM bytecode

2019-06-23 17:36发布

问题:

I want to find simple loops in LLVM bytecode, and extract the basic information of the loop.

For example:

 for (i=0; i<1000; i++)
    sum += i;

I want to extract the bound [0, 1000), the loop variable "i" and the loop body (sum += i).
What should I do?

I read the LLVM API document, and find some useful classes like "Loop", "LoopInfo".
But I do not know how to use them in detail.

Could you please give me some help? A detailed usage may be more helpful.

回答1:

If you do not want to use the pass manager, you might need to call the Analyze method in the llvm::LoopInfoBase class on each function in the IR (assuming you are using LLVM-3.4). However, the Analyze method takes the DominatorTree of each function as input, which you have to generate at first. Following codes are what I tested with LLVM-3.4 (assuming you have read the IR file and converted it into a Module* named as module):

for(llvm::Module::iterator func = module->begin(), y=module->end(); func!=y; func++){
        //get the dominatortree of the current function
        llvm::DominatorTree* DT = new llvm::DominatorTree();         
        DT->DT->recalculate(*func);
        //generate the LoopInfoBase for the current function
        llvm::LoopInfoBase<llvm::BasicBlock, llvm::Loop>* KLoop = new llvm::LoopInfoBase<llvm::BasicBlock, llvm::Loop>();
        KLoop->releaseMemory();
        KLoop->Analyze(DT->getBase());        
}

Basically, with KLoop generated, you get all kinds of LOOP information in the IR level. You can refer APIs in the LoopInfoBase class for details. By the way, you might want to add following headers: "llvm/Analysis/LoopInfo.h" "llvm/Analysis/Dominators.h".



回答2:

Once you get to the LLVM IR level, the information you request may no longer be accurate. For example, clang may have transformed your code so that i goes from -1000 up to 0 instead. Or it may have optimised "i" out entirely, so that there is no explicit induction variable. If you really need to extract the information exactly as it says at face value in the C code, then you need to look at clang, not LLVM IR. Otherwise, the best you can do is to calculate a loop trip count, in which case, have a look at the ScalarEvolution pass.

Check the PowerPC hardware loops transformation pass, which demonstrates the trip count calculation fairly well: http://llvm.org/docs/doxygen/html/PPCCTRLoops_8cpp_source.html

The code is fairly heavy, but should be followable. The interesting function is PPCCTRLoops::convertToCTRLoop. If you have any further questions about that, I can try to answer them.



回答3:

LLVM is just a library. You won't find AST nodes there.

I suggest to have a look at Clang, which is a compiler built on top of LLVM.

Maybe this is what you're looking for?



回答4:

Much like Matteo said, in order for LLVM to be able to recognize the loop variable and condition, the file need to be in LLVM IR. The question says you have it in LLVM bytecode, but since LLVM IR is written in SSA form, talking about "loop variables" isn't really true. I'm sure if you describe what you're trying to do, and what type of result you expect we can be of further help.

Some code to help you get started:

    virtual void getAnalysisUsage(AnalysisUsage &AU) const{
        AU.addRequired<LoopInfo>();
    }

    bool runOnLoop(Loop* L, LPPassManager&){
        BasicBlock* h = L->getHeader();
        if (BranchInst *bi = dyn_cast<BranchInst>(h->getTerminator())) {
            Value *loopCond = bi->getCondition();
        }
        return false;
    }

This code snippet is from inside a regular LLVM pass.



回答5:

Just an update on Junxzm answer, some references, pointers, and methods have changed in LLVM 3.5.

for(llvm::Module::iterator f = m->begin(), fe=m->end(); f!=fe; ++f){
        llvm::DominatorTree DT = llvm::DominatorTree();
        DT.recalculate(*f);
        llvm::LoopInfoBase<llvm::BasicBlock, llvm::Loop>* LoopInfo = new llvm::LoopInfoBase<llvm::BasicBlock, llvm::Loop>();
        LoopInfo->releaseMemory();
        LoopInfo->Analyze(DT);    
      }


标签: clang llvm