The performance of the simplex method on deterministic Markov decision processes
Speaker: | Ian Post |
---|---|
Affiliation: | University of Waterloo |
Room: | Mathematics and Computer Building (MC) 5158 |
Abstract:
Motivated
by
the
search
for
strongly-polynomial
algorithms
for
linear
programming
(an
algorithm
whose
runtime
depends
only
on
the
number
of
constraints
and
variables,
not
the
sizes
of
the
numbers
involved),
we
analyze
the
performance
of
the
simplex
method
on
Markov
decision
processes
(MDPs),
a
class
of
LPs
that
is
"close"
to
being
strongly-polynomial
but
beyond
the
reach
of
existing
techniques.
We
prove
that
for
deterministic
MDPs
the
simplex
method
with
the
highest
gain/most-negative-reduced
cost
pivoting
rule
converges
in
strongly-polynomial
time,
regardless
of
the
discount
factor
of
the
MDP
and
even
when
each
action
has
a
distinct
discount.
Previously
the
simplex
method
was
known
to
run
in
polynomial
time
only
for
discounted
MDPs
where
the
discount
was
bounded
away
from
one.
Unlike
in
the
discounted
case
the
algorithm
does
not
greedily
converge
to
the
optimum,
and
we
require
a
more
complex
measure
of
progress.
We
identify
a
set
of
layers
in
which
the
values
of
variables
must
lie
and
show
that
the
simplex
method
always
makes
progress
optimizing
the
variables
in
one
layer.
Joint
work
with
Yinyu
Ye.