System support for Collaborative Learning in Distributed Edge Cloud Environment

Daga, Harshit

Title:

System support for Collaborative Learning in Distributed Edge Cloud Environment

Files

DAGA-DISSERTATION-2023.pdf (28.19 MB)

Author(s)

Daga, Harshit

Advisor(s)

Gavrilovska, Ada

Advisor(s)

Person

Gavrilovska, Ada

Associated Organization(s)

Organizational Unit

College of Computing

Organizational Unit

School of Computer Science

Collections

Theses and Dissertations

Permanent Link

https://hdl.handle.net/1853/72768

Abstract

Multi-access Edge Computing (MEC) provides a highly distributed environment for emerging classes of real-time applications. These applications rely on machine learning to create value from the data at the edge by creating a global shared model. The existing techniques provide good predictive performance but come with an overhead of data movement and model transfer in the critical path of learning. Additionally, the existing machine learning systems offer limited flexibility in allowing modifications of the configuration of a deployed learning application to changes in the workload, infrastructure, requirements, or scale. This thesis explores the unique nature of edge workloads to provide system support for a new form of collaborative learning and aims to close the gaps in the existing solutions through the following contributions. We build Cartel – a new system for collaborative machine learning at the edge cloud. A global shared model, although trained on a broader variety of data, may not be required in full at each edge node. An alternative is to train and update the models online at each edge with local data, in isolation from other edges. However, changes in the environment or variations in workload patterns at an edge node, can impact the predictive performance of its model and require adaptation. Cartel provides a jump start, by transferring relevant knowledge from other edge(s) where similar patterns have been observed. This allows for more lightweight models (up to 3X), reduced backhaul data transfer (by few orders of magnitude), reduced training time and similar performance compared to centralized models, and strictly improved model accuracy compared to learning in isolation. Collaborative learning relies on the availability of knowledge transfer mechanisms to provide collaboration among peers. Given that applications increasingly use deep learning techniques, for which knowledge about a particular input class cannot be trivially attributed to a slice of the model, we build CLUE – a framework that facilitates knowledge transfer for neural networks. The system provides mechanisms for dynamically extracting significant parameters from a helper node's neural network and uses this with a multi-model boosting-based approach to improve the predictive performance of the target node. CLUE enables collaborative learning approaches to be used with neural networks, and results in improvements in the model's adaptability up to 3.5X compared to learning in isolation, while affording several magnitudes reduction in data movement costs compared to federated learning. Although collaborative learning helps in reduction of backhaul data transfer and improves training time, the dynamic characteristics of the edge environment make neither federated nor collaborative learning strictly better than the other. In response, we exploit the unique nature of the edge environment, and this thesis proposes MLKeeper, new system support for an orchestration service that analyzes the local and global trends to determine the learning mode at a node. Finally, each learning technique consists of components that perform unique tasks. While deploying such learning techniques, these components can be connected in different ways forming different topologies. In existing systems, these tasks are tightly coupled with system infrastructure. Such applications are built and deployed with certain assumptions about their deployment configuration and the communication methodology among the distributed nodes and cannot be easily adjusted in response to changes in the workload or deployment parameters. This thesis introduces Flame, which introduces a new abstraction to break these tasks into components that can be connected in a flexible and extensible manner. This thesis proposes innovative solutions for addressing the challenges of machine learning in MEC, which can enhance the performance and flexibility of learning techniques, reduce overheads associated with data movement and model transfer, and improve the adaptability of models in response to changes in the workload, infrastructure, requirements, or scale. The proposed systems, namely Cartel, CLUE, Flame, and MLKeeper, offer novel ways to address these issues and have the potential to pave the way for efficient and effective machine learning in the dynamic and distributed environment of MEC.

Date Issued

2023-08-01

Resource Type

Text

Resource Subtype

Dissertation

Full item page

Title:

System support for Collaborative Learning in Distributed Edge Cloud Environment

Files

Author(s)

Authors

Advisor(s)

Advisor(s)

Editor(s)

Associated Organization(s)

Series

Collections

Supplementary to

Permanent Link

Abstract

Sponsor

Date Issued

Extent

Resource Type

Resource Subtype

Rights Statement

Rights URI

Georgia Tech Library

Title: System support for Collaborative Learning in Distributed Edge Cloud Environment

Files

Author(s)

Authors

Advisor(s)

Advisor(s)

Editor(s)

Associated Organization(s)

Series

Collections

Supplementary to

Permanent Link

Abstract

Sponsor

Date Issued

Extent

Resource Type

Resource Subtype

Rights Statement

Rights URI

Title:

System support for Collaborative Learning in Distributed Edge Cloud Environment