Self-Checking Error Resilient Smart Autonomous System Design

Author(s)
Amarnath, Chandramouli
Editor(s)
Associated Organization(s)
Supplementary to:
Abstract
The increasing complexity of intelligent sense-and-control systems of interacting subsystems in safety-critical roles such as autonomous driving has driven research into failure detection, diagnosis and correction methods for reliable operation under stringent safety standards such as ISO 26262. In this work we assert that cross-layer failure detection, diagnosis, and correction are essential for the safety and scalability of intelligent autonomous systems composed of multiple interacting subsystems. This is accomplished using multi-domain, scalable outlier detection driven failure diagnosis and domain-specific failure correction. This work has two key focuses: (1) One key focus of this work is cross-layer synergies for system resilience, enhancing the autonomous system’s safety through information sharing and coordination between its subsystems. On-line detection and adaptation to failures in actuators and sensors as well as errors in control or state estimator computation are investigated and methods leveraging cross-layer subsystem interactions allow for rapid, on-line failure adaptation. (2) The second key focus of this work is secure, resilient execution of machine learning (ML) subsystems in roles such as semantic understanding (image classification) and reinforcement learning based control. Compute errors and security threats to ML subsystems during training and inference are detected in real time using reduced-dimension representations of intermediate features within the deep neural networks that make up these subsystems. Suppression of compute errors for safe execution is done through adaptive statistical thresholding of neuron values followed by zeroing of potentially erroneous values. These two thrusts enable resilient intelligent autonomous system design, using bottom-up cross-layer synergies for subsystem failures and module-level resilience methodologies (concurrent error detection, suppression and security modules) in ML subsystems.
Sponsor
Date
2024-12-07
Extent
Resource Type
Text
Resource Subtype
Dissertation
Rights Statement
Rights URI