PhD Seminar • Natural Language Processing • Information Seeking Beyond English

Wednesday, March 5, 2025 12:00 pm - 1:00 pm EST (GMT -05:00)

Please note: This PhD seminar will take place in DC 1304.

Xinyu Crystina Zhang, PhD candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Jimmy Lin

Pretrained language models have brought revolutionary progress to information-seeking in the English world. While the advance is exciting, how to transfer such progress into non-English, especially lower resource languages, presents new challenges that require developing new resources and methodologies.

In this talk, I will present my research on building effective information-seeking systems for non-English speakers. I will begin by introducing the benchmarks and datasets developed to support the evaluation and training of the multilingual search systems. These resources have since become widely adopted within the community and enable the development of effective multilingual embedding models. The next part of the talk will share the best training practices we found in such model development, including strategies for enhancing backbone models and surprising transfer effects across languages. Building on these foundations, my work expanded to understand how language models process multilingual text and facilitate knowledge transfer across languages.

The talk will conclude with a vision for the future of multilingual language model development, with the goal of adapting these models to unseen languages with minimal data and resource requirements and thus bridging the gap for underrepresented linguistic communities.