Please note: This PhD seminar will take place in DC 3301.
Joel Rorseth, PhD candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Lukasz Golab
Retrieval-augmented generation (RAG) enables large language models (LLMs) to integrate external knowledge. However, when users receive undesirable outputs, LLMs cannot faithfully explain which specific external knowledge was responsible. Existing counterfactual and rule-based provenance methods can attribute outputs to influential external knowledge, but are limited by poor scaling and rigid attribution granularity. To bridge these gaps, we introduce a novel framework that computes provenance efficiently at the level of propositions, identifying atomic facts that influence LLM outputs. Our scalable system exploits the hierarchical structure of textual knowledge to decompose documents into propositions, prune redundant propositions, and identify a minimal set of “sufficient" propositions.