Did a little dig further, by opt -O2 -print-after-all
, we can locate that it is exactly the second EarlyCSEPass of LLVM that conducts the optimization.
By looking at the source code of LLVM, the core logic is in these lines:
// If this is a read-only or write-only call, process it. Skip store
// MemInsts, as they will be more precisely handled later on. Also skip
// memsets, as DSE may be able to optimize them better by removing the
// earlier rather than later store.
if (CallValue::canHandle(&Inst) &&
(!MemInst.isValid() || !MemInst.isStore()) && !isa<MemSetInst>(&Inst)) {
// If we have an available version of this call, and if it is the right
// generation, replace this instruction.
std::pair<Instruction *, unsigned> InVal = AvailableCalls.lookup(&Inst);
if (InVal.first != nullptr &&
isSameMemGeneration(InVal.second, CurrentGeneration, InVal.first,
&Inst) &&
InVal.first->mayReadFromMemory() == Inst.mayReadFromMemory()) {
LLVM_DEBUG(dbgs() << "EarlyCSE CSE CALL: " << Inst
<< " to: " << *InVal.first << '\n');
if (!DebugCounter::shouldExecute(CSECounter)) {
LLVM_DEBUG(dbgs() << "Skipping due to debug counter\n");
continue;
}
combineIRFlags(Inst, InVal.first);
if (!Inst.use_empty())
Inst.replaceAllUsesWith(InVal.first);
salvageKnowledge(&Inst, &AC);
removeMSSA(Inst);
Inst.eraseFromParent();
Changed = true;
++NumCSECall;
continue;
}
// Increase memory generation for writes. Do this before inserting
// the call, so it has the generation after the write occurred.
if (Inst.mayWriteToMemory())
++CurrentGeneration;
// Otherwise, remember that we have this instruction.
AvailableCalls.insert(&Inst, std::make_pair(&Inst, CurrentGeneration));
continue;
}
Hmmmm, as far as I can tell, they just look at how the callee accesses memories as @comex said.
And I tried another thing: I copy-paste the attribute of foo
in the pure version to the panic
version in LLVM IR, and it turns out that the opt
can still conduct CSE successfully