Deep Reinforcement Learning for the Velocity Control of a Magnetic, Tethered Differential-Drive Robot

Thumbnail Image
Rawal, Devarsi Paresh
Pradalier, Cédric
Lau, Mackenzie
Ha, Sehoon
Associated Organization(s)
Organizational Unit
Supplementary to
The ROBOPLANET Altiscan crawler is a magnetic-wheeled, differential-drive robot being explored as an option to aid, if not completely replace, humans in the inspection and maintenance of marine vessels. Velocity control of the crawler is a crucial part in establishing trust and reliability amongst its operators. However, thanks to the crawler's elongated, magnetic wheels and umbilical tether, it operates in a complex environment rich with nonlinear dynamics which makes control challenging. Model-based approaches for the control of a robot that aim to mathematically formalize the physics of the system require an in-depth knowledge of the domain. Reinforcement learning (RL) is a trial-and-error-based approach that can solve control problems in nonlinear systems. To accommodate for high-dimensionality and continuous state spaces, deep neural networks (DNNs) can be used as nonlinear function approximators to extend RL, creating a method known as deep reinforcement learning (DRL). DRL coupled with a simulated environment provides a way for a model to learn physics-naive control. The research conducted in this thesis explored the efficacy of a DRL algorithm, proximal policy optimization (PPO), to learn the velocity control of the Altiscan crawler by modeling its operating environment in a novel, GPU-accelerated simulation software called Isaac Gym. The approaches evaluated the error between measured base velocities of the crawler as a result of the actions provided by the DRL model and target velocities in six different environments. Two variants of PPO, standard and recurrent, were compared against the inverse velocity kinematics model of a differential-drive robot. The results show that velocity control in simulation is possible using PPO, but evaluation on the real crawler is needed to come to a meaningful conclusion.
Date Issued
Resource Type
Resource Subtype
Rights Statement
Rights URI