Global Optimality Guarantees for Policy Gradient Methods

Thumbnail Image
Author(s)
Russo, Daniel
Advisor(s)
Editor(s)
Associated Organization(s)
Organizational Unit
Organizational Unit
Series
Series
Collections
Supplementary to:
Abstract
Policy gradients methods are perhaps the most widely used class of reinforcement learning algorithms. These methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices. Unfortunately, due to the multi-period nature of the objective, policy gradient algorithms face non-convex optimization problems and can get stuck in suboptimal local minima even for extremely simple problems. This talk with discus structural properties – shared by several canonical control problems – that guarantee the policy gradient objective function has no suboptimal stationary points despite being non-convex. Time permitting, I’ll also discuss (1) convergence rates that follow as a consequence of this theory and (2) consequences of this theory for policy gradient performed with highly expressive policy classes. * This talk is based on ongoing joint work with Jalaj Bhandari.
Sponsor
Date
2020-03-11
Extent
59:17 minutes
Resource Type
Moving Image
Resource Subtype
Lecture
Rights Statement
Rights URI