Title:
Leveraging Value-awareness for Online and Offline Model-based Reinforcement Learning
Leveraging Value-awareness for Online and Offline Model-based Reinforcement Learning
dc.contributor.advisor | Batra, Dhruv | |
dc.contributor.author | Modhe, Nirbhay | |
dc.contributor.committeeMember | Sukhatme, Gaurav | |
dc.contributor.committeeMember | Kalyan, Ashwin | |
dc.contributor.committeeMember | Kira, Zsolt | |
dc.contributor.committeeMember | Riedl, Mark | |
dc.contributor.department | Interactive Computing | |
dc.date.accessioned | 2023-01-10T16:24:58Z | |
dc.date.available | 2023-01-10T16:24:58Z | |
dc.date.created | 2022-12 | |
dc.date.issued | 2022-12-07 | |
dc.date.submitted | December 2022 | |
dc.date.updated | 2023-01-10T16:24:58Z | |
dc.description.abstract | Model-based Reinforcement Learning (RL) lies at the intersection of planning and learning for sequential decision making. Value-awareness in model learning has recently emerged as a means to imbue task or reward information into the objective of model learn- ing, in order for the model to leverage specificity of a task. While finding success in theory as being superior to maximum likelihood estimation in the context of (online) model-based RL, value-awareness has remained impractical for most non-trivial tasks. This thesis aims to bridge the gap in theory and practice by applying the principle of value-awareness to two settings – the online RL setting and offline RL setting. First, within online RL, this thesis revisits value-aware model learning from the perspective of minimizing performance difference, obtaining a novel value-aware model learning objec- tive as a direct upper bound of it. Then, this thesis investigates and remedies the issue of stale value estimates that has so far been holding back the practicality of value-aware model learning. Using the proposed remedy, performance improvements are presented over maximum-likelihood based baselines and existing value-aware objectives, in several continuous control tasks, while also enabling existing value-aware objectives to become performant. In the offline RL context, this thesis takes a step back from model learning and ap- plies value-awareness towards better data augmentation. Such data augmentation, when applied to model-based offline RL algorithms, allows for leveraging unseen states with low epistemic uncertainty that have previously not been reachable within the assumptions and limitations of model-based offline RL. Value-aware state augmentations are found to enable better performance on offline RL benchmarks compared to existing baselines and non-value-aware alternatives. | |
dc.description.degree | Ph.D. | |
dc.format.mimetype | application/pdf | |
dc.identifier.uri | http://hdl.handle.net/1853/70162 | |
dc.language.iso | en_US | |
dc.publisher | Georgia Institute of Technology | |
dc.subject | Reinforcement learning | |
dc.subject | Model-based reinforcement learning | |
dc.subject | Value-aware model learning | |
dc.subject | Offline reinforcement learning | |
dc.title | Leveraging Value-awareness for Online and Offline Model-based Reinforcement Learning | |
dc.type | Text | |
dc.type.genre | Dissertation | |
dspace.entity.type | Publication | |
local.contributor.advisor | Batra, Dhruv | |
local.contributor.corporatename | College of Computing | |
local.contributor.corporatename | School of Interactive Computing | |
relation.isAdvisorOfPublication | bbee09e1-a4fa-4d99-9dfd-b0605fea0f11 | |
relation.isOrgUnitOfPublication | c8892b3c-8db6-4b7b-a33a-1b67f7db2021 | |
relation.isOrgUnitOfPublication | aac3f010-e629-4d08-8276-81143eeaf5cc | |
thesis.degree.level | Doctoral |