Speaker: Laurent Charlin
Planning in partially observable domains is a notoriously difficult problem. However, in many real-world scenarios, planning can be simplified by decomposing the task into a hierarchy of smaller planning problems. Several approaches have been proposed to optimize a policy that decomposes according to a hierarchy specified a priori. In this thesis, I investigate the problem of automatically discovering the hierarchy. More precisely, I frame the optimization of a hierarchical policy as a non-convex optimization problem that can be solved with general non-linear solvers, mixed-integer non-linear solvers, a mixed-integer linear approximation, or a form of bounded hierarchical policy iteration. By encoding the hierarchical structure as variables of the optimization problem, I can automatically discover a hierarchy. My method is flexible enough to allow any parts of the hierarchy to be specified based on prior knowledge while letting the optimization discover the unknown parts. It can also discover hierarchical policies, including recursive policies, that are more compact (potentially infinitely fewer parameters). This work is done in collaboration with Pascal Poupart and Romy Shioda Note: This talk will be very similar (in content) to the one I gave for my master's presentation.
Food: Robin Cohen