<?xml version="1.0" encoding="UTF-8"?><xml><records><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>47</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Crowley, Mark</style></author><author><style face="normal" font="default" size="100%">Poole, David</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Policy gradient planning for environmental decision making with existing simulators</style></title><secondary-title><style face="normal" font="default" size="100%">25th AAAI Conference on Artificial Intelligence (AAAI-11)</style></secondary-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">mdp</style></keyword><keyword><style  face="normal" font="default" size="100%">reinforcement learning</style></keyword><keyword><style  face="normal" font="default" size="100%">spatiotemporal planning</style></keyword><keyword><style  face="normal" font="default" size="100%">thesis research</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">2011</style></year></dates><urls><web-urls><url><style face="normal" font="default" size="100%">https://www.scopus.com/record/display.uri?eid=2-s2.0-80055051332&amp;origin=inward&amp;txGid=de2006c39235aac9ba20cf0e76073dd9</style></url></web-urls></urls><pub-location><style face="normal" font="default" size="100%">San Francisco</style></pub-location><volume><style face="normal" font="default" size="100%">2</style></volume><pages><style face="normal" font="default" size="100%">1323–1330</style></pages><isbn><style face="normal" font="default" size="100%">9781577355090</style></isbn><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">In environmental and natural resource planning domains actions are taken at a large number of locations over multiple time periods. These problems have enormous state and action spaces, spatial correlation between actions, uncertainty and complex utility models. We present an approach for modeling these planning problems as factored Markov decision processes. The reward model can contain local and global components as well as spatial constraints between locations. The transition dynamics can be provided by existing simulators developed by domain experts. We propose a landscape policy defined as the equilibrium distribution of a Markov chain built from many locally-parameterized policies. This policy is optimized using a policy gradient algorithm. Experiments using a forestry simulator demonstrate the algorithm's ability to devise policies for sustainable harvest planning of a forest. Copyright © 2011, Association for the Advancement of Artificial Intelligence. All rights reserved.</style></abstract><issue><style face="normal" font="default" size="100%">1</style></issue></record></records></xml>