Please note: This master’s thesis presentation will take place in DC 2314.
Jess
Gano,
Master's
candidate
David
R.
Cheriton
School
of
Computer
Science
Supervisor: Professor Jesse Hoey
Identity,
as
a
concept,
is
concerned
with
the
social
positioning
of
the
self
and
the
other.
It
manifests
through
discourse
and
interactions
and
expressed
in
relation
to
other
perceived
identities.
For
example,
can
one
be
or
talk
as
a
leader
without
strictly
categorizing
those
they
interact
with
as
subordinates
or
employees?
Research
shows
that
the
onset
and
progression
of
dementia
may
undermine
the
individual's
sense
of
self
and
identity.
This
loss
of
self
or
identity
has
not
only
been
found
to
cause
significant
decrease
in
well-being,
but
also
affect
caregiver/care-recipient
relationships.
However,
while
identity
is
compromised
in
some
way,
it
does
not
necessarily
mean
it
is
completely
lost.
Autobiographical
stories,
especially
those
told
repeatedly,
may
serve
as
means
to
reveal
significant
aspects
of
the
storyteller's
self
and
identity.
In
this
thesis,
we
explore
the
task
of
persona
attribute
extraction
from
dialogues
as
a
proxy
for
identity
cues.
We
define
persona
attribute
as
a
triplet
in
the
format
of
(subject,
relation,
object)
e.g.,
(I,
has_hobby,
knitting).
Employing
an
information
extraction
approach,
we
design
a
two-stage
persona
attribute
extractor,
consisting
of
a
relation
predictor
and
entity
extractor.
Respectively,
we
define
relation
prediction
as
a
multi-label
classification
task
using
BERT
embeddings
and
feedforward
neural
networks,
and
entity
extraction
as
a
template
infilling
task
following
the
pre-training
objective
of
T5
(Raffel
et
al.,
2020).
We
employ
our
methods
on
a
proxy
dataset
created
by
combining
Persona-Chat
and
Dialogue-NLI.
Factoring
ethical
considerations
and
potential
risks,
directly
evaluating
our
methods
on
a
dementia
use-case
is
not
a
feasible
task.
Therefore,
we
utilize
a
dataset
consisting
of
interviews
with
older
adults
to
assess
feasibility
within
a
context
more
closely
resembling
the
dementia
use-case.
Exploring
the
research
problem
and
developing
our
methodology
highlights
the
following
insights:
(1)
inferring
identities
from
text,
especially
considering
its
nuanced
representation
in
discourse,
is
challenging
due
to
the
abstract
nature
of
identity
itself
and
(2)
to
our
knowledge,
there
is
no
available
dataset
that
exhibits
the
distinct
speech
characteristics
inherent
in
older
adults
making
training
and
evaluating
models
tailored
to
this
demographic
very
challenging.
Furthermore,
experiments
on
the
older
adults
dataset
show
that
a
transfer
learning
approach
to
solving
this
problem
is
insufficient
due
to
significant
contrast
between
the
datasets
from
the
source
and
target
domains.