Deep learning for building and validating geometric and semantic maps

Thumbnail Image
Lambert, John
Hays, James
Dellaert, Frank
Associated Organizations
Organizational Unit
Organizational Unit
Supplementary to
Mapping the world is an essential tool for making spatial artificial intelligence a reality in our near future. Spatial AI, or embodied intelligence for 3D perception, enables awareness and understanding of our surroundings. Maps serve as a core workhorse of motion prediction and motion planning for modern autonomous vehicles. Maps also enable human users to interact with novel 3D spaces remotely via virtual reality (VR) or convey useful information about an environment through augmented reality (AR). Current methods for building and validating geometric and semantic maps are limited in several ways. For example, floorplan maps constructed from sparse camera views within indoor environments generally suffer from low completeness. In other domains, such as city streets, the world is ever-changing, making online validation of high-definition (HD) maps a requirement for today’s self-driving vehicles; however, many current map change detection methods suffer from high-storage costs or limited accuracy. This dissertation research introduces new algorithms for building and validating geometric and semantic maps using deep learning, with three original contributions. I first develop a new learning-based algorithm, SALVe, for creating complete and accurate 2d geometric maps (floorplans) under very wide baselines and occlusion. Second, I explore the role of the deep "front end" in Structure-from-Motion (SfM), and analyze its use in GTSFM, a new system for global SfM. Finally, I introduce learning-based formulations for solving the HD map change detection task in a bird’s eye view and ego-view. Because real map changes are infrequent and vector maps are easy to synthetically manipulate, we lean on simulated data to train such models. Perhaps surprisingly, we show that such models can generalize to real world distributions. Along the way, in order to satisfy the demands of these data-driven, deep learning approaches, I contribute several large-scale datasets to- wards solving these problems – the Argoverse 1.0 Datasets, the MSeg Dataset, the Trust but Verify (TbV) Dataset, and the Argoverse 2.0 Datasets.
Date Issued
Resource Type
Resource Subtype
Rights Statement
Rights URI