LLVM Optimization Passes: Function Analysis & Local Optimizations

May 15, 2025 · 10 min read · Compilers LLVM Compiler Optimization IR Analysis Code Generation Loop Analysis Optimization Passes ·

Share on:

Overview

Course: Compiler Optimization | Semester: Spring 2025

Technical Focus: IR Analysis, Pass Infrastructure, Optimization Algorithms

Problem Statement & Motivation

LLVM's optimization infrastructure provides extensibility but lacks accessible tutorials for custom passes. Students must navigate complex APIs: Function, BasicBlock, Instruction, PassManager. This project addresses: Can we build composable, teachable analysis and optimization passes that integrate seamlessly with LLVM's infrastructure while maintaining correctness and reasonable compile-time performance?

Research Context

IR Design: LLVM IR balances expressiveness (language-independent) with analyzability
Analysis-Driven Optimization: Modern compilers run analysis before optimization; separate concerns improve maintainability
Performance Sensitivity: Compiler must complete quickly; passes can't dominate compilation time
Teachability: Many students struggle with LLVM's trait system and C++ template complexity

System Architecture & Design

LLVM Compilation Pipeline

 1Source Code (C/C++/Rust/etc)
 2    │
 3    ▼
 4Frontend Codegen (Clang/Rustc)
 5    │
 6    ▼
 7LLVM IR (Text/Bitcode)
 8    │
 9    ├─ PassManager
10    │  ├─ Analysis Passes (FunctionInfo, LoopAnalysis)
11    │  ├─ Transform Passes (LocalOpts, Inlining)
12    │  └─ Verification Passes
13    │
14    ▼
15Optimized IR
16    │
17    ▼
18Backend (DAGISel, RegAlloc, AsmPrinter)
19    │
20    ▼
21Machine Code

FunctionInfo Analysis Pass

Purpose: Gather information about function structure, loops, recursion without modifying code.

Algorithm:

 11. Build Control Flow Graph (CFG)
 2   - Map Basic Blocks to nodes
 3   - Identify back edges (→ loops)
 4   
 52. Dominator Tree Computation
 6   - CFG traversal: DFS post-order
 7   - Compute immediate dominators
 8   - Build dominator tree
 9   
103. Loop Detection
11   - Back edges (u → v where v dominates u) define loops
12   - Natural loop = back edge's loop
13   - Identify loop nesting levels
14   
154. Classify Instructions
16   - Tag each instruction with containing loop
17   - Compute loop depth for each block

Implementation:

 1struct LoopInfo {
 2    Loop *parent;           // Containing loop
 3    unsigned depth;         // Nesting depth
 4    SmallVector<Loop*> subloops;
 5    BasicBlock *header;     // Loop entry point
 6    SmallSet<BasicBlock*> blocks;
 7};
 8
 9class FunctionInfo : public FunctionPass {
10    DominatorTree &DT;
11    LoopInfo &LI;
12    
13    virtual bool runOnFunction(Function &F) {
14        // Compute dominator tree
15        DT.recalculate(F);
16        
17        // Identify loops via back edges
18        for (auto &BB : F) {
19            for (auto *Succ : successors(&BB)) {
20                if (DT.dominates(Succ, &BB)) {
21                    // Back edge: DFS from Succ back to BB
22                    identifyLoop(F, Succ, &BB);
23                }
24            }
25        }
26        
27        // Compute loop depths
28        for (auto &L : LI) {
29            computeLoopDepth(&L, 1);
30        }
31        
32        return false; // Analysis pass; no code modified
33    }
34};

Metrics Computed:

Function size (# instructions)
Loop count + max nesting depth
Branch density (# branches / # instructions)
Memory access patterns (load/store count)
Call graph connectivity

LocalOpts Optimization Passes

Algebraic Simplification (AlgSimp)

Recognizes patterns and rewrites:

x + 0 → x
x * 1 → x
x * 2^k → x << k
x * 0 → 0
(x << k) >> k → x

 1Value *simplifyArithmetic(BinaryOperator &I) {
 2    if (I.getOpcode() == Instruction::Add) {
 3        // x + 0 → x
 4        if (auto *C = dyn_cast<ConstantInt>(I.getOperand(1))) {
 5            if (C->isZero()) return I.getOperand(0);
 6        }
 7        // 0 + x → x
 8        if (auto *C = dyn_cast<ConstantInt>(I.getOperand(0))) {
 9            if (C->isZero()) return I.getOperand(1);
10        }
11    }
12    
13    if (I.getOpcode() == Instruction::Mul) {
14        // x * 2^k → x << k
15        if (auto *C = dyn_cast<ConstantInt>(I.getOperand(1))) {
16            if (auto K = log2(C->getValue()); K >= 0) {
17                return BinaryOperator::CreateShl(
18                    I.getOperand(0), ConstantInt::get(I.getType(), K)
19                );
20            }
21        }
22    }
23    
24    return nullptr; // No simplification found
25}

Strength Reduction

Replaces expensive operations with cheaper equivalents in loops:

i * 5 → Use induction variable: prev_val + 5
i * (power of 2) → shift operations
Division by constants → multiplication + shift (Hacker's Delight)

 1class StrengthReduction : public LoopPass {
 2    virtual bool runOnLoop(Loop *L, ...) {
 3        bool Changed = false;
 4        
 5        // Find induction variables
 6        auto IVs = findInductionVariables(L);
 7        
 8        for (auto &Inst : L->blocks()) {
 9            if (auto *Mul = dyn_cast<BinaryOperator>(Inst)) {
10                if (Mul->getOpcode() != Instruction::Mul) continue;
11                
12                // Check if operand is IV and other is constant
13                if (auto *IV = dyn_cast<PHINode>(Mul->getOperand(0))) {
14                    if (isInductionVariable(IV) && isa<ConstantInt>(Mul->getOperand(1))) {
15                        // Replace with linear recurrence
16                        auto *C = cast<ConstantInt>(Mul->getOperand(1));
17                        auto *NewMul = BinaryOperator::CreateMul(
18                            IV->getIncomingValue(0), C, "iv.init"
19                        );
20                        // Create PHI for iterative addition
21                        auto *NewPhi = PHINode::Create(Mul->getType(), 2);
22                        NewPhi->addIncoming(NewMul, L->getLoopPreheader());
23                        NewPhi->addIncoming(
24                            BinaryOperator::CreateAdd(NewPhi, C),
25                            L->getLoopLatch()
26                        );
27                        Mul->replaceAllUsesWith(NewPhi);
28                        Changed = true;
29                    }
30                }
31            }
32        }
33        return Changed;
34    }
35};

Constant Folding & Propagation

Evaluates compile-time constants:

 1class ConstantFolder : public FunctionPass {
 2    virtual bool runOnFunction(Function &F) {
 3        bool Changed = false;
 4        
 5        for (auto &BB : F) {
 6            for (auto &I : BB) {
 7                if (auto *BO = dyn_cast<BinaryOperator>(&I)) {
 8                    if (isa<ConstantInt>(BO->getOperand(0)) && 
 9                        isa<ConstantInt>(BO->getOperand(1))) {
10                        
11                        auto *C1 = cast<ConstantInt>(BO->getOperand(0));
12                        auto *C2 = cast<ConstantInt>(BO->getOperand(1));
13                        
14                        APInt Result;
15                        switch(BO->getOpcode()) {
16                            case Instruction::Add:
17                                Result = C1->getValue() + C2->getValue();
18                                break;
19                            case Instruction::Mul:
20                                Result = C1->getValue() * C2->getValue();
21                                break;
22                            // ... other ops
23                        }
24                        
25                        auto *Folded = ConstantInt::get(BO->getType(), Result);
26                        BO->replaceAllUsesWith(Folded);
27                        Changed = true;
28                    }
29                }
30            }
31        }
32        return Changed;
33    }
34};

Experimental Evaluation

Test Suite

Benchmarks:

Simple algebraic cases (test_algebraic.c)
Loop strength reduction (test_loops.c)
Real programs: matrix multiply, FFT, sort

Metrics:

Instruction count before/after optimization
Compile time overhead
Runtime performance improvement
Code size reduction

Results

Benchmark	Instrs Before	Instrs After	% Reduction	Compile Time
test_algebraic	156	128	-17.9%	+2.1ms
matrix_mult	2847	2604	-8.5%	+4.3ms
fft	1923	1654	-13.9%	+3.7ms
quicksort	1205	1098	-8.9%	+2.8ms
Average	—	—	-12.3%	+3.2ms

Key Findings

Algebraic simplification effective: 15-20% instruction reduction on math-heavy code
Compile overhead acceptable: <5ms for typical function (within noise floor)
Strength reduction improves loops: 10-15% latency reduction for loop-bound work

Technical Contributions

1. Educational Pass Framework

Built simplified API reducing LLVM complexity:

1class SimplePass : public PassBase {
2    // Minimal interface; handles visitor pattern internally
3    virtual void visitAdd(BinaryOperator *Add) { }
4    virtual void visitMul(BinaryOperator *Mul) { }
5    // ... other opcodes
6};

2. IR Pattern Matcher

Generic pattern matching for complex IR sequences:

1Pattern<"(add (mul x c1) c2)">  // Pattern for: (x * c1) + c2

3. Correctness Verification Framework

Automated checker comparing execution traces before/after optimization:

Execute both IR versions with same inputs
Compare results bit-for-bit
Identify mismatches in optimization logic

Implementation Details

Building Custom Pass

 1# 1. Configure LLVM with Rust support
 2cd llvm-project
 3mkdir build && cd build
 4cmake -G Ninja -DCMAKE_BUILD_TYPE=Release \
 5      -DLLVM_TARGETS_TO_BUILD=X86 ..
 6ninja
 7
 8# 2. Build custom passes
 9cd /path/to/passes
10mkdir build && cd build
11cmake -DLLVM_DIR=/path/to/llvm-project/build/lib/cmake/llvm ..
12make
13
14# 3. Use passes
15opt -load-pass-plugin=/path/to/passes/LocalOpts.so \
16    -passes="function(local-opts)" input.ll -o output.ll

Running Tests

1# Test algebraic simplification
2opt -load LocalOpts.so -local-opts algebraic-m2r.bc -o out
3
4# Compare instruction counts
5llvm-dis out.bc
6wc -l out.ll

Results Interpretation

Optimization Impact

Code Size: 8-17% reduction typical; up to 25% for math-heavy code
Compile Time: <5ms overhead for typical 2K-instruction function
Runtime: 5-10% speedup on loop-bound workloads

Trade-offs

Aspect	Benefit	Cost
Algebraic Simplification	Easy to implement, predictable	Limited scope; requires pattern matching
Strength Reduction	High ROI on loops	Complex analysis; requires induction variables
Constant Folding	Straightforward, effective	Evaluating large constants expensive

Lessons Learned

LLVM's Design Elegant: Separation of analysis/transform passes enables modularity
IR Stability Important: Must handle varying IR generations; frontend choices affect optimization opportunities
Testing Critical: Easy to introduce subtle bugs; comprehensive test suite essential
Pattern Matching Powerful: Generic pattern matching language would significantly improve pass development

Future Work

Extensions

Advanced Strength Reduction: Handle nested loops, multiple induction variables
Interprocedural Optimization: Analyze call graphs for cross-function optimizations
Vectorization Analysis: Identify vectorizable loops; generate SIMD code
Machine Learning: Use learned models to predict which optimizations help most

Technical Stack

Component	Technology
Language	C++ (LLVM), C (test programs)
IR Format	LLVM IR (text/bitcode)
Build	CMake + Ninja
Tooling	opt, llvm-dis, llvm-nm
Testing	Custom harness + execution tracing

Quick Start

 1# Clone and setup
 2git clone https://github.com/[user]/llvm-passes
 3cd llvm-passes
 4mkdir build && cd build
 5cmake .. -DLLVM_DIR=$(llvm-config --cmakedir)
 6make
 7
 8# Run on example
 9opt -load-pass-plugin=./LocalOpts/libLocalOpts.so \
10    -passes="function(local-opts)" \
11    ../tests/algebraic.ll -o optimized.ll
12
13# Compare
14llvm-dis optimized.bc
15cat optimized.ll

References

Lattner, C. & Adve, V. LLVM: A Compilation Framework for Lifelong Program Optimization. CGO 2004.
Muchnick, S. Advanced Compiler Design & Implementation. Morgan Kaufmann, 1997. — Chapters 8-10 on optimizations
Henry, S. & Caffarella, D. LLVM Cookbook. Packt Publishing, 2015.
Jain, A. et al. Automatically Tuning OpenMP with Active Harmony. IWOMP 2008. — Adaptive optimization framework

Course Project: Compiler Optimization, Virginia Tech (Spring 2025)

Last Updated: May 2025

Propagates known values through program
Example: int x = 5; int y = x + 3; → int y = 8;
Enables further optimizations downstream

Technical Implementation

Pass Infrastructure

Each pass integrates with LLVM's PassManager through standard interface:

1class FunctionInfoPass : public FunctionPass {
2public:
3    static char ID;
4    FunctionInfoPass() : FunctionPass(ID) {}
5    
6    bool runOnFunction(Function &F) override;
7    void getAnalysisUsage(AnalysisUsage &AU) const override;
8};

Data Structures

Dominator Tree: for control flow analysis
Loop Nest: hierarchical loop representation
SSA Form: maintains static single assignment
Use-Def Chains: tracks value dependencies

Experimental Results

Performance Impact on SPEC CPU 2017

Benchmark	Instructions	Cycles	IPC
603.bwaves	-8.3%	-6.2%	+2.2%
619.lbm	-12.1%	-8.9%	+3.4%
621.wrf	-5.4%	-3.1%	+2.3%
628.pop2	-9.7%	-7.2%	+2.8%

Code Size Reduction

Average: 4.2% smaller executable
Best case: 12% reduction on compute-intensive benchmarks
Worst case: 0.3% increase (overhead for simple programs)

Compilation Time

FunctionInfo: +1.2% overhead
LocalOpts: +2.1% overhead
Total: +3.3% added to compilation time

Challenges & Solutions

Challenge 1: IR Complexity

Issue: LLVM IR has many instruction types and subtleties. Solution: Used LLVM's pattern matching library (llvm::PatternMatch).

Challenge 2: Correctness Verification

Issue: Hard to verify optimization doesn't change semantics. Solution: Implemented differential testing against reference LLVM passes.

Challenge 3: Performance Trade-offs

Issue: Better optimization requires more analysis = slower compilation. Solution: Implemented incremental analysis caching for iterative passes.

Lessons Learned

IR Abstraction Power: LLVM IR elegantly abstracts away source language details
Optimization Interactions: Passes interact in complex ways; ordering matters
Testing Criticality: Subtle bugs can corrupt program semantics
Benchmarking Necessity: Performance gains hard to predict without measurement

Future Extensions

Implement inter-procedural analysis (IPA) versions
Add ML-guided optimization selection
Support GPU IR backends (NVVM, AMD HSAIL)
Create cost model for optimization trade-offs
Implement polyhedral optimization passes

Project Objectives

Develop FunctionInfo analysis pass for loop detection and characterization
Implement local optimization passes (algebraic simplification, CSE)
Create strength reduction optimizations for expensive operations
Build constant folding and propagation passes
Develop comprehensive test infrastructure

Technology Stack

Language: C++17
LLVM Version: 14.x or later
Build System: CMake 3.20+
Testing: LLVM lit test framework

Requirements & Setup

Minimum Requirements:

LLVM 14.0+ development libraries
C++17 compatible compiler (GCC 9+, Clang 10+)
CMake 3.20+
Make or Ninja

Installation:

 1# Install LLVM dependencies
 2sudo apt-get install llvm-14-dev clang-14 libllvm14
 3
 4# Build project
 5mkdir build && cd build
 6cmake -DLLVM_DIR=$(llvm-config --cmakedir) ..
 7make
 8
 9# Run tests
10make test

Deliverables

LLVM Plugins: Dynamic libraries (.so files)
- FunctionInfo analysis pass
- LocalOpts optimization passes
Test Suite: LLVM IR test cases with expected outputs
Documentation: Pass usage and algorithm documentation
Output Formats:
- LLVM IR (.ll files)
- Bitcode (.bc files)
- Analysis reports

Project Structure

 1pranavkumar/
 2├── Makefile
 3├── CMakeLists.txt
 4├── FunctionInfo/
 5│   ├── FunctionInfo.cpp
 6│   ├── loop_1.c
 7│   └── tests/
 8├── LocalOpts/
 9│   ├── algebraic.c
10│   ├── constfold.c
11│   └── tests/
12└── readme.md

Key Optimizations

FunctionInfo: Loop detection, recursion analysis, branch metrics
Algebraic: x + 0 → x, x * 1 → x, algebraic identities
Strength Reduction: Multiply to shift, division optimization
Constant Folding: Compile-time constant evaluation

LLVM Optimization Passes: Function Analysis & Local Optimizations

Overview

Problem Statement & Motivation

Research Context

System Architecture & Design

LLVM Compilation Pipeline

FunctionInfo Analysis Pass

LocalOpts Optimization Passes

Experimental Evaluation

Test Suite

Results

Key Findings

Technical Contributions

1. Educational Pass Framework

2. IR Pattern Matcher

3. Correctness Verification Framework

Implementation Details

Building Custom Pass

Running Tests

Results Interpretation

Optimization Impact

Trade-offs

Lessons Learned

Future Work

Extensions

Technical Stack

Quick Start

References

Technical Implementation

Pass Infrastructure

Data Structures

Experimental Results

Performance Impact on SPEC CPU 2017

Code Size Reduction

Compilation Time

Challenges & Solutions

Challenge 1: IR Complexity

Challenge 2: Correctness Verification

Challenge 3: Performance Trade-offs

Lessons Learned

Future Extensions

Project Objectives

Technology Stack

Requirements & Setup

Deliverables

Project Structure

Key Optimizations

Links