Abstract

Semantic segmentation tasks require expensive and time-consuming pixel-level annotations. Unsupervised domain adaptation (UDA) aims to transfer knowledge from a label-rich source domain to a target domain with no labels. Recently, vision-language models (VLMs) have shown promise for domain-adaptive classification, but remain under-explored for domain-adaptive semantic segmentation (DASS). Existing language-guided DASS methods align pixel-level features with generic class-wise prompts, which require target-domain knowledge, and do not leverage the intricate spatial relationships and object context endowed by language priors. In this work, we propose LangDA, the first domain-agnostic approach to explicitly induce context-awareness in language-driven DASS. In LangDA, we align image features with VLM-generated context-aware scene descriptions via a consistency objective. LangDA achieves state-of-the-art results on three adaptation benchmarks, outperforming existing methods by 3.9%, 2.6%, and 1.4%.

Presenter

Chang Liu, MASc candidate in Systems Design Engineering

Join online or in-person.

Attending this seminar will count towards the graduate student seminar attendance milestone!