Distributed services with elastic container memory abstractions for big data clouds
Author(s)
Bae, Juhyun
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
Big data powered distributed services and applications have become an integral part of many products and services today. Yet, they suffer large performance loss when their working sets memory no longer fully fit in container memory and they cannot leverage unused host and remote memory even in the presence of large imbalance in memory utilization across containers and across cluster. They also suffer from limited scalability when the application logic deals with traffic from many nodes in the cluster due to limited resources in a host node(e.g., RAM, network). This dissertation research focuses on addressing these problems by developing distributed services with elastic container memory abstraction, enabling container runtime to utilize free memory on the same host and in remote nodes of a cluster with four unique contributions. (1) A transparent and elastic host memory abstraction to allow cross-container memory sharing based on dynamic and transient memory demands of container runtime. (2) An elastic remote network memory abstraction on top of RDMA to reduce the performance gap between local and remote memory. Several RDMA optimizations are introduced to provide efficient communication fabrics for enabling containers to leverage our cross-node memory abstraction and achieve maximum throughput. (3) A transparent network memory storage for container execution to allow cross-container and cross-node memory sharing. This enables containers to flexibly expand its memory demand to remote memory in the cluster in response to the unexpected demand on the working set memory when the host memory is insufficient to accommodate the demand. (4) Resilient decentralized communication protocols to support two types of distributed services with no inherent bottleneck and high connectivity and reachability to the overlay network: scalable and fault tolerant network memory sharing system and federated learning system.
Sponsor
Date
2022-05-03
Extent
Resource Type
Text
Resource Subtype
Dissertation