Adaptable Policies and Accelerated Infrastructure for Learning for Heterogeneous Multi-Robot Coordination

Author(s)
Jain, Shalin Anand
Editor(s)
Associated Organization(s)
Organizational Unit
Organizational Unit
School of Computer Science
School established in 2007
Supplementary to:
Abstract
Many real-world challenges require heterogeneous agents to work together in a way that leverages their diverse capabilities to complete complex tasks. Heterogeneous multi-robot teams are a promising platform to address these real-world problems, and several algorithmic approaches exist to specify coordinated behavior for robots within these teams. Recently, learning-based approaches have emerged as a promising avenue for alleviating the burden of technical expertise and domain knowledge required to explicitly specify robust coordination behaviors in complex tasks. However, the adopters of learning based approaches face two significant tradeoffs when applying learning to heterogeneous robot teams: (1) Policies for multi-robot teams can either be represented as a single set of parameters shared across all robots, a set of parameters learned for each class of robots, or fully individual parameters learned for each robot. Existing shared-parameter designs prioritize sample efficiency by enabling a single set of parameters to learn from the experience of all robots or robots within the same class using input augmentations, but tend to limit behavioral diversity. In contrast, learning separate policies for each robot enables greater diversity and expressivity at the cost of efficiency and generalization to unseen robots. (2) Existing platforms for training multi-robot policies often force a tradeoff between optimization for multi-agent learning, robotics relevance, and sim-to-real deployment capability. Existing multi-agent benchmark simulators are highly optimized for learning with multiple-agents, but they lack fidelity. Existing robotics simulators are high-fidelity, but are not optimized for simulating and training multiple interacting agents and do not support open-access sim-to-real deployment. In this work, we aim to make progress on addressing these two tradeoffs, both at the architecture level and the infrastructure level. In the architecture thrust, we view shared parameters and individual parameters as two ends of a broader spectrum and propose a middle ground approach: Capability Aware Shared Hypernetworks (CASH). CASH is a soft weight sharing architecture that uses hypernetworks to efficiently learn a flexible shared policy that dynamically adapts to each robot post training. CASH outperforms baseline architectures in terms of performance and sample efficiency during both training and zero-shot generalization, all with 60%-80% fewer learnable parameters. In the infrastructure thrust, we contribute JaxRobotarium, a Jax-powered end-to-end simulation, learning, deployment, and benchmarking platform for the Robotarium. JaxRobotarium enables rapid training and deployment of multi-robot reinforcement learning (MRRL) policies with realistic robot dynamics and safety constraints, supporting both parallelization and hardware acceleration. With eight natively implemented benchmark tasks, We demonstrate that JaxRobotarium retains high simulation fidelity while achieving dramatic speedups over baseline (20x in training and 150x in simulation), and provides an open-access sim2real evaluation pipeline through the Robotarium testbed, accelerating and democratizing access to multi-robot learning research and evaluation.
Sponsor
Date
2025-12
Extent
Resource Type
Text
Resource Subtype
Thesis (Masters Degree)
Rights Statement
Rights URI