Xin Lian, Master’s candidate
David R. Cheriton School of Computer Science
The problem of language alignment has long been an exciting topic for Natural Language Processing researchers. Current methods for learning cross-domain correspondences at the word level rely on distributed representations of words. Therefore, the recent development in the word computational linguistics and neural language modeling has led to the development of the so-called zero-shot learning paradigm. Many algorithms were proposed to solve the bilingual alignment problem in supervised or unsupervised manners. A natural way to extend the bilingual alignment to the multilingual setting is to pick one language as a pivot to transit through. However, transiting through a pivot language assumes transitive relation among all pairs of languages, which is not enforced in the training process of bilingual tasks. Therefore, transiting through an uninformed pivot language degrades the quality of translation.
Motivated by the observation that using information from other languages during the training process helps improve translating language pairs, we propose a new algorithm for unsupervised multilingual alignment, where we employ the barycenter of all language word embeddings as a new pivot to imply translations. The barycenter is closely related to the joint mapping for all input languages hence encapsulates all useful information for translation. Finally, we evaluate our method by jointly aligning word vectors in 7 European languages and demonstrating noticeable improvement to the current state-of-the-art method.