MC 6460
Speaker
Michael
Hynes
Department
of
Applied
Mathematics,
University
of
Waterloo
Title
A Taylor polynomial expansion line search for large-scale optimization
Abstract
In
trying
to
cope
with
the
Big
Data
deluge,
the
landscape
of
distributed
computing
has
changed.
Large
commodity
hardware
clusters,
typically
operating
in
some
form
of
MapReduce
framework,
are
becoming
prevalent
for
organizations
that
require
both
tremendous
storage
capacity
and
fault
tolerance.
However,
the
high
cost
of
communication
can
dominate
the
computation
time
in
large-scale
optimization
routines
in
these
frameworks.
This
thesis
considers
the
problem
of
how
to
efficiently
conduct
univariate
line
searches
in
commodity
clusters
in
the
context
of
gradient-based
batch
optimization
algorithms,
like
the
staple
limited-memory
BFGS
(LBFGS)
method.
In
it,
a
new
line
search
technique
is
proposed
for
cases
where
the
underlying
objective
function
is
analytic,
as
in
logistic
regression
and
low
rank
matrix
factorization.
The
technique
approximates
the
objective
function
by
a
truncated
Taylor
polynomial
along
a
fixed
search
direction.
The
coefficients
of
this
polynomial
may
be
computed
efficiently
in
parallel
with
far
less
communication
than
needed
to
transmit
the
high-dimensional
gradient
vector,
after
which
the
polynomial
may
be
minimized
with
high
accuracy
in
a
neighbourhood
of
the
expansion
point
without
distributed
operations.
This
Polynomial
Expansion
Line
Search
(PELS)
may
be
invoked
iteratively
until
the
expansion
point
and
minimum
are
sufficiently
accurate,
and
can
provide
substantial
savings
in
time
and
communication
costs
when
multiple
iterations
in
the
line
search
procedure
are
required.
Three
applications
of
the
PELS
technique
are
presented
herein
for
important
classes
of
analytic
functions:
(i)
Logistic
Regression
(LR),
(ii)
low-rank
Matrix
Factorization
(MF)
models,
and
(iii)
the
feedforward
multilayer
perceptron
(MLP).
In
addition,
for
LR
and
MF,
implementations
of
PELS
in
the
Apache
Spark
framework
for
fault-tolerant
cluster
computing
are
provided.
These
implementations
conferred
significant
convergence
enhancements
to
their
respective
algorithms,
and
will
be
of
interest
to
Spark
and
Hadoop
practitioners.
For
instance,
the
Spark
PELS
technique
reduced
the
number
of
iterations
and
time
required
by
LBFGS
to
reach
terminal
training
accuracies
for
LR
models
by
factors
of
1.8--2.
Substantial
acceleration
was
also
observed
for
the
Nonlinear
Conjugate
Gradient
algorithm
for
MLP
models,
and
is
an
interesting
case
for
future
study
in
optimization
for
neural
networks
models.
The
PELS
technique
is
applicable
to
a
broad
class
of
models
for
Big
Data
processing
and
large-scale
optimization,
and
can
be
a
useful
component
of
batch
optimization
routines.