Please note: This PhD defence will take place in DC 2314.
Dake Zhang, PhD candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Mark Smucker
Online news shapes how people form opinions on topics such as science, health, and politics, yet the same environment that makes it widely accessible also enables low-quality and deceptive content to propagate at scale. This thesis studies how AI systems can help readers assess the trustworthiness of online news. Rather than predicting whether an article is true or false, we treat trustworthiness assessment as a process to be supported: a useful system should help readers ask the investigative questions that a careful reader would ask, and synthesize the external evidence and context needed to answer them. This framing is grounded in lateral reading, the strategy professional fact-checkers use when evaluating online information by searching beyond the page itself.
We pursue system development and evaluation together because the two are inseparable in a new research area. This research program began with ReadProbe, a proof-of-concept retrieval-augmented LLM system that generated investigative questions, retrieved web evidence, and produced attributed answers. A subsequent pilot study, originally intended as a formative step toward a larger human-baseline study of question generation, characterized the kinds of questions university-affiliated readers wrote before and after brief lateral-reading guidance, and revealed useful design lessons. These lessons motivated the TREC 2024 Lateral Reading Track, the first shared benchmark in this area, which formalized question generation and document retrieval as foundational tasks. Results showed that question generation remains challenging even for frontier LLMs, while document retrieval is comparatively mature. A follow-up analysis further revealed that LLM-generated question lists are less diverse than those written by human experts and overlap little with them, suggesting that alignment with expert investigative priorities is the core open problem.
These observations led to the TREC 2025 DRAGUN Track, which shifted the benchmark toward a more reader-oriented setting by introducing report generation as the main task and replacing direct grading with expert-authored, importance-weighted rubrics built through open-web research. To support the track, we developed an iterative multi-agent RAG system that simulates a lateral reader by interleaving query generation, multi-stage segment retrieval, information sufficiency evaluation, question generation, and report writing, which served as a strong starter-kit baseline. Finally, to make rubric-based evaluation reusable beyond the originally judged runs, we released an LLM-based AutoJudge system that mirrors the human assessment protocol and preserves run-level rankings well against the official human judgments.
Through this research journey, three key findings stand out. First, generating investigative questions that align with expert priorities remains hard: even the strongest system in the DRAGUN track covered only about one-third of the importance-weighted rubric question space on average. Second, given a useful question, document retrieval is no longer the central bottleneck. The harder problems now sit at the planning and synthesis ends of the pipeline. Third, evaluation matters as much as system design, and rubric-based evaluation grounded in expert open-web research is a more useful and interpretable target than direct grading of system outputs.
Overall, this thesis contributes a reader-centered framework for assistive AI in news trustworthiness assessment, along with shared tasks, datasets, systems, and a reusable evaluator that together support continued progress toward AI tools that help readers think more carefully about online news rather than think for them.