Title:
System support for Collaborative Learning in Distributed Edge Cloud Environment

Thumbnail Image
Author(s)
Daga, Harshit
Authors
Advisor(s)
Gavrilovska, Ada
Advisor(s)
Editor(s)
Associated Organization(s)
Organizational Unit
Organizational Unit
Series
Supplementary to
Abstract
Multi-access Edge Computing (MEC) provides a highly distributed environment for emerging classes of real-time applications. These applications rely on machine learning to create value from the data at the edge by creating a global shared model. The existing techniques provide good predictive performance but come with an overhead of data movement and model transfer in the critical path of learning. Additionally, the existing machine learning systems offer limited flexibility in allowing modifications of the configuration of a deployed learning application to changes in the workload, infrastructure, requirements, or scale. This thesis explores the unique nature of edge workloads to provide system support for a new form of collaborative learning and aims to close the gaps in the existing solutions through the following contributions. We build Cartel – a new system for collaborative machine learning at the edge cloud. A global shared model, although trained on a broader variety of data, may not be required in full at each edge node. An alternative is to train and update the models online at each edge with local data, in isolation from other edges. However, changes in the environment or variations in workload patterns at an edge node, can impact the predictive performance of its model and require adaptation. Cartel provides a jump start, by transferring relevant knowledge from other edge(s) where similar patterns have been observed. This allows for more lightweight models (up to 3X), reduced backhaul data transfer (by few orders of magnitude), reduced training time and similar performance compared to centralized models, and strictly improved model accuracy compared to learning in isolation. Collaborative learning relies on the availability of knowledge transfer mechanisms to provide collaboration among peers. Given that applications increasingly use deep learning techniques, for which knowledge about a particular input class cannot be trivially attributed to a slice of the model, we build CLUE – a framework that facilitates knowledge transfer for neural networks. The system provides mechanisms for dynamically extracting significant parameters from a helper node's neural network and uses this with a multi-model boosting-based approach to improve the predictive performance of the target node. CLUE enables collaborative learning approaches to be used with neural networks, and results in improvements in the model's adaptability up to 3.5X compared to learning in isolation, while affording several magnitudes reduction in data movement costs compared to federated learning. Although collaborative learning helps in reduction of backhaul data transfer and improves training time, the dynamic characteristics of the edge environment make neither federated nor collaborative learning strictly better than the other. In response, we exploit the unique nature of the edge environment, and this thesis proposes MLKeeper, new system support for an orchestration service that analyzes the local and global trends to determine the learning mode at a node. Finally, each learning technique consists of components that perform unique tasks. While deploying such learning techniques, these components can be connected in different ways forming different topologies. In existing systems, these tasks are tightly coupled with system infrastructure. Such applications are built and deployed with certain assumptions about their deployment configuration and the communication methodology among the distributed nodes and cannot be easily adjusted in response to changes in the workload or deployment parameters. This thesis introduces Flame, which introduces a new abstraction to break these tasks into components that can be connected in a flexible and extensible manner. This thesis proposes innovative solutions for addressing the challenges of machine learning in MEC, which can enhance the performance and flexibility of learning techniques, reduce overheads associated with data movement and model transfer, and improve the adaptability of models in response to changes in the workload, infrastructure, requirements, or scale. The proposed systems, namely Cartel, CLUE, Flame, and MLKeeper, offer novel ways to address these issues and have the potential to pave the way for efficient and effective machine learning in the dynamic and distributed environment of MEC.
Sponsor
Date Issued
2023-08-01
Extent
Resource Type
Text
Resource Subtype
Dissertation
Rights Statement
Rights URI