Policy Gradient Optimization Using Equilibrium Policies for Spatial Planning Domains