Master’s Thesis Presentation • Software Engineering • Understanding the Impact of Inputs on LLM-Based Automated Test Generation

Thursday, April 16, 2026 10:00 am - 11:00 am EDT (GMT -04:00)

Please note: This master’s thesis presentation will take place in DC 2310 and online.

Saarang Agarwal, Master’s candidate
David R. Cheriton School of Computer Science

Supervisors: Professors Pengyu Nie, Mei Nagappan

Large Language Models (LLMs) and software development agents are driving a paradigm shift in software engineering, transitioning the field from a human-first to an Artificial Intelligence (AI)-first approach to development. This shift has significantly improved the efficiency of tasks such as code generation and automated testing. However, the performance of these LLMs and agents remains highly dependent on the quality and type of input information they receive. We investigate how different types of input and contextual information influence the effectiveness of LLMs and automated test generation agents through two studies: one focusing on LLMs and the other on test generation agents, each using a distinct dataset.

The first study focuses on LLMs and the impact of different inputs on their ability to generate unit tests, with particular emphasis on the availability of software requirements and the correctness of the code under test. We evaluate five state-of-the-art LLMs of varying scales on the CodeChef dataset and analyze how different input configurations affect the correctness, bug detection capability, and code coverage of the generated unit tests. Our findings indicate that combining code with clear requirements produces the highest-quality test cases, whereas generating tests from incorrect code alone yields significantly poorer results.

The second study focuses on understanding which features of bug reports influence the performance of test generation agents in generating bug-reproducing tests. We identify a set of salient features in bug reports and manually annotate 709 reports from the SWT-Bench Lite and Verified datasets based on the presence of these features. Using selected agents from the SWT-Bench leaderboards, we analyze their test generation capabilities and examine overlaps in the bug reports they resolve. We further leverage the annotated dataset to assess how these features affect agent performance and conduct statistical analyses to determine their relative importance in enabling agents to generate bug-reproducing tests. Our findings indicate that including natural-language solutions, localization information, and descriptions of correct behavior is associated with improved performance in bug-reproducing test generation. Moreover, different agents prioritize different features, reflecting variations in their underlying architectures and LLMs.

Together, these studies provide critical insights into how different types of information influence the effectiveness of LLMs and test generation agents. Readers can use these insights to guide the effective use of current LLMs and agents, as well as to inform the design and development of future systems.


To attend this master’s thesis presentation in person, please go to DC 2310. You can also attend virtually on Zoom.