Towards Deep Learning System and Algorithm Co-design

Thumbnail Image
Wu, Yanzhao
Liu, Ling
Pu, Calton
Associated Organization(s)
Organizational Unit
Organizational Unit
Supplementary to
Big data-powered deep learning (DL) systems and applications have blossomed in recent years. However, there are several critical challenges in deep learning. First, it is well-known that DL models may fail to deliver expected accuracy performance under adverse situations. For example, a well-trained DL model not only fails when tested on some unseen examples but is also vulnerable to adversarial examples, which tend to result in a drastic accuracy degradation. Second, the high complexity of DL and DL frameworks involves many system-level and algorithm-specific hyperparameters, such as the CPU, GPU and memory configurations, learning rates, batch sizes, and optimizers, which complicates the tuning process of these hyperparameters for optimizing deep learning performance. Third, in addition to the demands for more robust and efficient deep learning models, we witness the growing interest in deploying model inference and model learning to the edge of the Internet, where data are generated but the computing resources are limited. These challenges demand the co-design of deep learning systems and algorithms for performance optimization of deep learning systems and deep learning as a service. This dissertation research takes a holistic approach to promote the deep learning system and algorithm co-design with three original contributions to tackle these challenges. First, we develop a systematic framework for creating ensembles of failure independent models by leveraging system and algorithm co-design for prediction fusion through diversity-based ensemble optimizations. We introduce a focal diversity concept to capture diversity through high failure independence and low negative correlation. We develop focal model based ensemble diversity metrics to compose high-quality ensembles with complementary member models, which effectively boosts the overall accuracy of ensemble learning. We develop ensemble selection algorithms based on a suite of ensemble pruning strategies, which select ensemble teams of high diversity and remove low diversity ensembles. In addition to combining the DL models trained for the same learning task, we also develop a two-tier heterogeneous ensemble learning approach to combine the DL models trained for different learning tasks, such as object detection and semantic segmentation. Our formal analysis and empirical results demonstrate the effectiveness of our system and algorithm co-design for high diversity ensemble learning. Our EnsembleBench tool has been successfully used in adversarial learning for strengthening the overall robustness. Second, we developed a methodical approach to configuration management of deep learning frameworks by exploring the intrinsic correlations between system-level parameters and algorithm-specific hyperparameters and how different combinations may impact the performance of deep learning models. The core system parameters include configurations of CPU, GPU, memory, parallel processing and multi-thread management, and the DNN algorithm-specific hyperparameters include learning rate policies, optimizers, batch sizes, and so forth. For example, we characterized the CPU/GPU resource usage patterns under different configurations and different DL frameworks to obtain an in-depth understanding of how varying batch sizes and learning rate policies may impact the deep learning model performance. We also provide a set of metrics for evaluating and selecting learning rate policies, including the classification confidence, variance, cost, and robustness. Two benchmarking tools, GTDLBench and LRBench, are made publicly available. Last but not least, we have leveraged the system and algorithm co-design through a suite of optimization techniques to enable DNN inference and DNN learning at the edge. For example, edge video analytics is a core component for many real-time deep learning systems, such as autonomous driving, video surveillance and the Internet of smart cameras. Edge server load surge and Wi-Fi network bandwidth saturation can further aggravate the mismatch between the incoming video streaming rate in #frames per second (FPS) and the detection processing performance, which often results in random frame dropping. We explore the detection model parallel execution approach to leverage multi-model multi-device detection parallelism for fast object detection at the edge to improve the throughput and meet the runtime performance requirements.
Date Issued
Resource Type
Resource Subtype
Rights Statement
Rights URI