The performance of the simplex method on deterministic Markov decision processes
|Affiliation:||University of Waterloo|
|Room:||Mathematics and Computer Building (MC) 5158|
Motivated by the search for strongly-polynomial algorithms for linear programming (an algorithm whose runtime depends only on the number of constraints and variables, not the sizes of the numbers involved), we analyze the performance of the simplex method on Markov decision processes (MDPs), a class of LPs that is "close" to being strongly-polynomial but beyond the reach of existing techniques. We prove that for deterministic MDPs the simplex method with the highest gain/most-negative-reduced cost pivoting rule converges in strongly-polynomial time, regardless of the discount factor of the MDP and even when each action has a distinct discount. Previously the simplex method was known to run in polynomial time only for discounted MDPs where the discount was bounded away from one. Unlike in the discounted case the algorithm does not greedily converge to the optimum, and we require a more complex measure of progress. We identify a set of layers in which the values of variables must lie and show that the simplex method always makes progress optimizing the variables in one layer.
Joint work with Yinyu Ye.
200 University Avenue West
Waterloo, ON N2L 3G1