Towards Safe and Efficient Learning for Dexterous Manipulation

Thumbnail Image
Jain, Abhineet
Ravichandar, Harish
Associated Organization(s)
Organizational Unit
Organizational Unit
Supplementary to
Imitation learning (IL) is a promising approach to help robots acquire dexterous manipulation capabilities without the need for a carefully-designed reward or significant computational effort. However, existing IL approaches require sophisticated data collection infrastructure and struggle to generalize beyond the training distribution. Other approaches like reinforcement learning interact with the environment to train black-box neural networks, providing little control over how the robot learns and performs the skills. Such approaches are especially challenging on physical platforms - a robot’s erratic behavior poses harm not only to itself but also the entities around it. Since training uses a large number of interactions with the environment, there is also room for improving sample efficiency. In this work, we address these challenges. First, we demonstrate that we can learn an object relocation task using demonstrations from LeapMotion, an inexpensive vision-based sensor. Policies using these demonstrations show similar success to policies from wearable sensors, at the cost of sample efficiency. This reduces the setup cost to collect demonstrations by approximately 140x. Second, we investigate the importance of collecting additional data that better represents the full operating conditions. We compare the performance of corrective additional demonstrations and randomly-sampled additional demonstrations for an object relocation task. When there are more additional demonstrations from the full task distribution than the original demonstrations from a restrictive training distribution, the corrective demonstrations considerably outperform the randomly-sampled ones. Otherwise, there are no significant differences between the two. Third, we introduce a simple geometric constraint to guide the robot when learning object relocation. Using Constrained Policy Optimization, the robot can quickly learn to move towards the object, and uses similar number of samples to learn the skill as the unconstrained approach. We show how simple constraints can help robots achieve sensible and safe behavior quickly and ease concerns surrounding hardware deployment. We also provide insights into how different degrees of strictness of these constraints affect learning. Finally, we curate a library of constraints generalizable across multiple dexterous manipulation tasks, and introduce a hierarchical approach that prioritizes these constraints across different phases of the task. We train two policies to learn an object relocation task - a low-level policy that learns how to perform the task given a set of constraints, and a high-level policy that decides what constraints to activate during different stages of training the low-level policy. With this hierarchical approach, the agent learns to perform the task while reducing the duration of unsafe behaviors during training. Our findings indicate that prioritizing the right constraints addresses physical and environmental safety concerns as the hierarchical policy can both train and perform the tasks safely.
Date Issued
Resource Type
Resource Subtype
Rights Statement
Rights URI