Robot Learning from Heterogeneous Demonstration

Chen, Letian Zac

Title:

Robot Learning from Heterogeneous Demonstration

dc.contributor.advisor	Gombolay, Matthew
dc.contributor.author	Chen, Letian Zac
dc.contributor.committeeMember	Chernova, Sonia
dc.contributor.committeeMember	Ravichandar, Harish
dc.contributor.department	Computer Science
dc.date.accessioned	2021-06-10T16:48:44Z
dc.date.available	2021-06-10T16:48:44Z
dc.date.created	2020-05
dc.date.issued	2020-04-28
dc.date.submitted	May 2020
dc.date.updated	2021-06-10T16:48:44Z
dc.description.abstract	Learning from Demonstration (LfD) has become a ubiquitous and user-friendly technique to teach a robot how to perform a task (e.g., playing Ping Pong) without the need to use a traditional programming language (e.g., C++). As these systems are increasingly being placed in the hands of everyday users, researchers are faced with the reality that end-users are a heterogeneous population with varying levels of skills and experiences. This heterogeneity violates almost universal assumptions in LfD algorithms that demonstrations given by users are near-optimal and uniform in how the task is accomplished. In this thesis, I present algorithms to tackle two specific types of heterogeneity: heterogeneous strategy and heterogeneous performance. First, I present Multi-Strategy Reward Distillation (MSRD), which tackles the problem of learning from users who have adopted heterogeneous strategies. MSRD extracts separate task reward and strategy reward, which represents task specification and demonstrator's strategic preference, respectively. We are able to extract the task reward that has 0.998 and 0.943 correlation with ground-truth reward on two simulated robotic tasks and successfully deploy it on a real-robot table-tennis task. Second, I develop two algorithms to address the problem of learning from suboptimal demonstration: SSRR and OP-AIRL. SSRR is a novel mechanism to regress over noisy demonstrations to infer an idealized reward function. OP-AIRL is a mechanism to learn a policy that more effectively teases out ambiguity from sub-optimal demonstrations. By combining SSRR with OP-AIRL, we are able to achieve a 688% and a 254% improvement over state-of-the-art on two simulated robot tasks.
dc.description.degree	M.S.
dc.format.mimetype	application/pdf
dc.identifier.uri	http://hdl.handle.net/1853/64653
dc.language.iso	en_US
dc.publisher	Georgia Institute of Technology
dc.subject	Learning from demonstration
dc.subject	Robot learning
dc.subject	Heterogeneous learning
dc.title	Robot Learning from Heterogeneous Demonstration
dc.type	Text
dc.type.genre	Thesis
dspace.entity.type	Publication
local.contributor.corporatename	College of Computing
relation.isOrgUnitOfPublication	c8892b3c-8db6-4b7b-a33a-1b67f7db2021
thesis.degree.level	Masters