Please note: This PhD defence will take place in DC 2310.
Kira
Selby,
PhD
candidate
David
R.
Cheriton
School
of
Computer
Science
Supervisor: Professor Pascal Poupart
This
thesis
is
an
investigation
of
the
powerful
and
flexible
applications
of
analyzing
empirical
distributions
of
vectors
within
latent
spaces.
These
methods
have
historically
been
applied
with
great
success
to
the
domain
of
word
embeddings,
leading
to
improvements
in
robustness
against
polysemy,
unsupervised
inference
of
hierarchical
relationships
between
words,
and
even
used
to
shatter
existing
benchmarks
on
unsupervised
translation.
This
work
will
serve
to
extend
these
existing
lines
of
inquiry,
with
a
focus
on
two
key
areas
of
further
research:
- Probabilistic approaches to robustness in natural language.
- Approximating general distance functions between distributions in order to infer general hierarchical relationships between words from their distributions over contexts.
Motivated by these initial research directions, the resulting investigations will then demonstrate novel and significant contributions to a diverse range of problems across many different fields and domains — far beyond the narrow scope of word embeddings. The key contributions of this work are threefold:
- Proposing a probabilistic, model-agnostic framework for robustness in natural language models. The proposed model improves performance on a wide range of downstream tasks compared to existing baselines.
- Constructing a general architecture for modelling distance functions between multiple permutation invariant sets. The proposed architecture is proved to be a universal approximator for all \textit{partially permutation-invariant} functions and outperforms all existing baselines on a number of set-based tasks, as well as approximating distance functions such as KL Divergence and Mutual Information.
- Leveraging this architecture to define a novel, set-based approach to few-shot image generation. The proposed approach outperforms all existing image-to-image baselines without making restrictive assumptions about the structure of the training and evaluation sets that might limit its ability to generalize, making it a promising candidate for scaling to true zero-shot generation.