Title:
Navigating to Objects: Simulation, Data, and Models

Thumbnail Image
Author(s)
Ramrakhya, Ram
Authors
Advisor(s)
Batra, Dhruv
Advisor(s)
Person
Editor(s)
Associated Organization(s)
Organizational Unit
Organizational Unit
Series
Supplementary to
Abstract
General-purpose robots that can perform a diverse set of embodied tasks in a diverse set of environments have to be good at visual exploration. Consider the canonical example of asking a household robot, ‘Where are my keys?’. To answer this (assuming the robot does not remember the answer from memory), the robot would have to search the house, often guided by intelligent priors – e.g. peeking into the washroom or kitchen might be sufficient to be reasonably sure the keys are not there, while exhaustively searching the living room might be much more important since keys are more likely to be there. While doing so, the robot has to internally keep track of where all it has been to avoid redundant search, and it might also have to interact with objects, e.g. check drawers and cabinets in the living room (but not those in the washroom or kitchen!). This example illustrates fairly sophisticated exploration, involving a careful interplay of various implicit objectives (semantic priors, exhaustive search, efficient navigation, interaction, etc.) which are hard to learn using Reinforcement Learning (RL). In this thesis, we focus on learning such embodied object-search strategies from human demonstrations which implicitly captures intelligent behavior we wish to impart to our agents. In Part I, we present a large-scale study of imitating human demonstrations on tasks that require a virtual robot to search for objects in new environments – (1) ObjectGoal Navigation (e.g. ‘find & go to a chair’) and (2) PICK&PLACE (e.g. ‘find mug, pick mug, find counter, place mug on counter’). In Part 2, we extend our focus to improving agents trained using human demonstrations in a tractable way. Towards this, we present PIRLNav, a two-stage learning scheme for BC pretraining on human demonstrations followed by RL-finetuning. Finally, using this BC→RL training recipe, we present a rigorous empirical analysis where we investigate whether human demonstrations can be replaced with ‘free’ (automatically generated) sources of demonstrations, e.g. shortest paths (SP) or task-agnostic frontier exploration (FE) trajectories.
Sponsor
Date Issued
2023-05-03
Extent
Resource Type
Text
Resource Subtype
Thesis
Rights Statement
Rights URI