Anudeep Das, a graduate student at the Cheriton School of Computer Science, has received a $58,100 USD grant from Open Philanthropy to support his research on security of large language models. His project focuses on developing stealthy and resilient backdoors in LLMs, an emerging area of research as these models become more widely used.
Anudeep is collaborating with master’s students, Prach Chantasantitam and Gurjot Singh. Together, they will explore how attackers could exploit the open-ended, long-form nature of LLM outputs and the frequent updates these models undergo to create backdoors that are difficult to detect, remove and defend against.
Their work investigates how white-box attackers — adversaries with complete access to a model’s architecture and parameters — can fine-tune an LLM so that malicious outputs are triggered by the model’s own previously generated text. This technique makes it challenging to detect and eliminate these backdoors.
Read the full article from Computer Science to learn more.