Please note: This master’s thesis presentation will take place in DC 2314 and online.
Sherman Siu, Master’s candidate
David R. Cheriton School of Computer Science
Supervisors: Professors Lesley Istead, Jeff Orchard
Recent advances in large language models have enabled natural language to serve as a flexible interface for controlling complex systems, but often require large-scale multimodal training or sacrifice domain-specific inductive biases. In structured decision-making domains such as chess, specialized models achieve strong performance but lack high-level semantic controllability, while prompt-conditioned approaches are more flexible but typically exhibit weaker domain grounding.
In this thesis, we study prompt-conditioned policy modulation for chess by adapting a pretrained neural policy network using natural language prompts. We propose UniMaia, a framework that combines a frozen Lc0-based chess policy network with a LoRA-adapted text encoder and a ControlNet-style conditioning mechanism. This design enables semantic control over gameplay, including opening selection and player strength, while preserving the underlying representations of the base model. We further introduce UniMaia-Aux, an extension that incorporates auxiliary temporal conditioning and behavioral prediction objectives.
To support this work, we construct a large-scale, metadata-augmented version of the Lichess dataset, introduce a semi-automated pipeline for generating natural language prompt templates, and propose evaluation benchmarks spanning both prompt-conditioned and metadata-conditioned settings.
Empirically, UniMaia achieves competitive or superior performance relative to prior work across multiple benchmarks. It attains the highest top-move accuracy on prompt-conditioned benchmarks while remaining competitive with metadata-conditioned models on human move prediction tasks. Prompt-conditioned models perform strongly in frequency-dominated regimes, such as common openings and highly active player behavior, whereas metadata-conditioned models generally achieve stronger expected accuracy. UniMaia bridges these approaches by combining strong domain-specific inductive biases with flexible prompt-based control.
UniMaia-Aux further demonstrates that auxiliary temporal conditioning can improve expected accuracy and behavioral modeling across several evaluation settings, although this introduces trade-offs between top-move accuracy and dependence on temporally structured information.
Overall, this work demonstrates that prompt-conditioned control of domain-specific policy networks is feasible without end-to-end multimodal training. At the same time, the results highlight ongoing challenges related to prompt sensitivity, policy calibration, robustness, and the trade-offs between controllability and predictive performance in prompt-conditioned decision-making systems.
To attend this master’s thesis presentation in person, please go to DC 2314. You can also attend virtually on Zoom.