Master’s Thesis Presentation • Artificial Intelligence | Machine Learning • UniMaia: Steering Chess Policies with Language for Human-like Play | Cheriton School of Computer Science

Thursday, June 25, 2026 1:00 pm - 2:00 pm EDT (GMT -04:00)

Please note: This master’s thesis presentation will take place in DC 2314 and online.

Sherman Siu, Master’s candidate
David R. Cheriton School of Computer Science

Supervisors: Professors Lesley Istead, Jeff Orchard

Recent advances in large language models have enabled natural language to serve as a flexible interface for controlling complex systems, but often require large-scale multimodal training or sacrifice domain-specific inductive biases. In structured decision-making domains such as chess, specialized models achieve strong performance but lack high-level semantic controllability, while prompt-conditioned approaches are more flexible but typically exhibit weaker domain grounding.

In this thesis, we study prompt-conditioned policy modulation for chess by adapting a pretrained neural policy network using natural language prompts. We propose UniMaia, a framework that combines a frozen Lc0-based chess policy network with a LoRA-adapted text encoder and a ControlNet-style conditioning mechanism. This design enables semantic control over gameplay, including opening selection and player strength, while preserving the underlying representations of the base model. We further introduce UniMaia-Aux, an extension that incorporates auxiliary temporal conditioning and behavioral prediction objectives.

To support this work, we construct a large-scale, metadata-augmented version of the Lichess dataset, introduce a semi-automated pipeline for generating natural language prompt templates, and propose evaluation benchmarks spanning both prompt-conditioned and metadata-conditioned settings.

Empirically, UniMaia achieves competitive or superior performance relative to prior work across multiple benchmarks. It attains the highest top-move accuracy on prompt-conditioned benchmarks while remaining competitive with metadata-conditioned models on human move prediction tasks. Prompt-conditioned models perform strongly in frequency-dominated regimes, such as common openings and highly active player behavior, whereas metadata-conditioned models generally achieve stronger expected accuracy. UniMaia bridges these approaches by combining strong domain-specific inductive biases with flexible prompt-based control.

UniMaia-Aux further demonstrates that auxiliary temporal conditioning can improve expected accuracy and behavioral modeling across several evaluation settings, although this introduces trade-offs between top-move accuracy and dependence on temporally structured information.

Overall, this work demonstrates that prompt-conditioned control of domain-specific policy networks is feasible without end-to-end multimodal training. At the same time, the results highlight ongoing challenges related to prompt sensitivity, policy calibration, robustness, and the trade-offs between controllability and predictive performance in prompt-conditioned decision-making systems.

To attend this master’s thesis presentation in person, please go to DC 2314. You can also attend virtually on Zoom.

Location Information

Location Address: DC - William G. Davis Computer Research Centre
200 University Avenue West
Hybrid: DC 2314 | Online master’s thesis presentation
Waterloo, ON, CA N2L 3G1

Location coordinates: