Distributed services with elastic container memory abstractions for big data clouds

Author(s)
Bae, Juhyun
Editor(s)
Associated Organization(s)
Organizational Unit
Organizational Unit
School of Computer Science
School established in 2007
Series
Supplementary to:
Abstract
Big data powered distributed services and applications have become an integral part of many products and services today. Yet, they suffer large performance loss when their working sets memory no longer fully fit in container memory and they cannot leverage unused host and remote memory even in the presence of large imbalance in memory utilization across containers and across cluster. They also suffer from limited scalability when the application logic deals with traffic from many nodes in the cluster due to limited resources in a host node(e.g., RAM, network). This dissertation research focuses on addressing these problems by developing distributed services with elastic container memory abstraction, enabling container runtime to utilize free memory on the same host and in remote nodes of a cluster with four unique contributions. (1) A transparent and elastic host memory abstraction to allow cross-container memory sharing based on dynamic and transient memory demands of container runtime. (2) An elastic remote network memory abstraction on top of RDMA to reduce the performance gap between local and remote memory. Several RDMA optimizations are introduced to provide efficient communication fabrics for enabling containers to leverage our cross-node memory abstraction and achieve maximum throughput. (3) A transparent network memory storage for container execution to allow cross-container and cross-node memory sharing. This enables containers to flexibly expand its memory demand to remote memory in the cluster in response to the unexpected demand on the working set memory when the host memory is insufficient to accommodate the demand. (4) Resilient decentralized communication protocols to support two types of distributed services with no inherent bottleneck and high connectivity and reachability to the overlay network: scalable and fault tolerant network memory sharing system and federated learning system.
Sponsor
Date
2022-05-03
Extent
Resource Type
Text
Resource Subtype
Dissertation
Rights Statement
Rights URI