<?xml version="1.0" encoding="UTF-8"?><xml><records><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Taleghan, Majid Alkaee</style></author><author><style face="normal" font="default" size="100%">Dietterich, Thomas G.</style></author><author><style face="normal" font="default" size="100%">Crowley, Mark</style></author><author><style face="normal" font="default" size="100%">Hall, Kim</style></author><author><style face="normal" font="default" size="100%">Albers, H. Jo</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">PAC Optimal MDP Planning with Application to Invasive Species Management</style></title><secondary-title><style face="normal" font="default" size="100%">Journal of Machine Learning Research</style></secondary-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">computational sustainability</style></keyword><keyword><style  face="normal" font="default" size="100%">Good- Turing estimate</style></keyword><keyword><style  face="normal" font="default" size="100%">grant-wici16</style></keyword><keyword><style  face="normal" font="default" size="100%">invasive species management</style></keyword><keyword><style  face="normal" font="default" size="100%">machine learning</style></keyword><keyword><style  face="normal" font="default" size="100%">Markov decision processes</style></keyword><keyword><style  face="normal" font="default" size="100%">mdp</style></keyword><keyword><style  face="normal" font="default" size="100%">optimization</style></keyword><keyword><style  face="normal" font="default" size="100%">reinforcement-learning</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">2015</style></year></dates><urls><web-urls><url><style face="normal" font="default" size="100%">http://jmlr.org/papers/v16/taleghan15a.html</style></url></web-urls></urls><volume><style face="normal" font="default" size="100%">16</style></volume><pages><style face="normal" font="default" size="100%">3877–3903</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">In a simulator-defined MDP, the Markovian dynamics and rewards are provided in the form of a simulator from which samples can be drawn. This paper studies MDP planning algorithms that attempt to minimize the number of simulator calls before terminating and outputting a policy that is approximately optimal with high probability. The paper introduces two heuristics for efficient exploration and an improved confidence interval that enables earlier termination with probabilis- tic guarantees. We prove that the heuristics and the confidence interval are sound and produce with high probability an approximately optimal policy in polynomial time. Experiments on two benchmark problems and two instances of an invasive species management problem show that the improved confidence intervals and the new search heuristics yield reductions of between 8% and 47% in the number of simulator calls required to reach near-optimal policies.</style></abstract></record></records></xml>