Project 1 - A Compiler Optimization Observatory — Instrumenting LLVM at Scale | Women in Computer Science

Graduate mentor's supervisor: Prof. Chengnian Sun

Every time you compile a C or C++ program, the compiler quietly rewrites your code thousands of times to make it faster, e.g. "x + 0 -> x". In LLVM (behind Clang, Swift, and Rust), one pass called InstCombine performs an enormous share of these rewrites. We have built an open-source tool, instcombine-debugger, that patches LLVM to record every transformation InstCombine performs. This project extends that tool to capture richer traces, turning an opaque, heavily-used optimizer into something we can observe and understand. Live demo: https://xuhongxu.com/instcombine-instrumentor/

Working in a team of 3-4, students extend the instrumentation along two axes — coverage (more passes) and enrichment (more detail per transformation). The items below are a menu, not a checklist; we will scope a realistic subset together at the start.

Short-term options (we pick a subset for the term):

Coverage: instrument other peephole optimization passes beyond InstCombine, with each member owning one pass against a shared trace format.
Enrichment: record more context per transformation, such as which rule fired, the instructions before and after, worklist state, and the conditions that enabled the rewrite.
Integration (shared): merge everyone's work into one trace schema and validate that traces are complete and consistent on real programs.

Longer-term (if continuing): use the enriched traces for empirical analysis of compiler behavior, feeding into compiler testing and fuzzing research.

This project is best suited to second-year students and up, though a strong first-year with C++ knowledge is also welcome.

Required: comfort with Python scripting and the command line; basic C/C++ reading ability (reading, not writing).
A plus: a compilers course, or prior exposure to LLVM.