Graduate mentor's supervisor: Prof. Chengnian Sun
AI coding agents can attempt real compiler work, but they stumble on implementing optimizations: asked to add a rewrite rule to LLVM's InstCombine pass, they often produce patches that miscompile programs, break tests, or land in the wrong place, and our benchmarking shows agents fail many such tasks. The open question is what feedback closes the gap: when the agent is handed a correctness counterexample, a profitability estimate, or a regression result, does its success rate improve, and which helps most? This project answers that on a fixed open model in a fully observable loop.
Students build on an existing open-source LLVM agent harness rather than starting from scratch — the agent loop, the model backend, and an Alive2 correctness checker already exist. The team of 3-4 adds what the harness lacks and measures what helps:
Short-term (within the term):
- Feedback tools: each member owns one tool — a profitability estimate via llvm-mca, a regression-test result, or a new signal of their own design — wrapping its output into guidance the agent can act on.
- Measurement: assess each tool's value by running the same tasks with it available and again without it, then comparing how often the agent succeeds.
Longer-term (if continuing):
- Propose and compare additional feedback tools.
- Build toward a full study of which feedback signals most help agent-written optimizations.
Reference: https://github.com/dtcxzyw/llvm-harness
This project is best suited for second-year students and up, and capable first-years are welcome — the harness already exists, so the work is wrapping tools and running experiments, not building infrastructure.
- Required: solid Python skills, command-line comfort, and basic C/C++ reading ability.
- A plus: prior exposure to LLVM IR or coding agents.