The Contextual Bandits Problem: Techniques for Learning to Make High-Reward Decisions

Author(s)
Schapire, Robert
Advisor(s)
Editor(s)
Associated Organization(s)
Organizational Unit
Series
Series
Collections
Supplementary to:
Abstract
We consider how to learn through experience to make intelligent decisions. In the generic setting, called the contextual bandits problem, the learner must repeatedly decide which action to take in response to an observed context, and is then permitted to observe the received reward, but only for the chosen action. The goal is to learn to behave nearly as well as the best policy (or decision rule) in some possibly very large and rich space of candidate policies. This talk will describe progress on developing general methods for this problem and some of its variants.
Sponsor
Date
2017-10-30
Extent
63:55 minutes
Resource Type
Moving Image
Resource Subtype
Lecture
Rights Statement
Rights URI