Memory System Optimizations for Parallel and Bandwidth-Intensive Workloads
Author(s)
Kadiyala, Divya Kiran
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
The rapid proliferation of digital services—from web analytics and cloud platforms to social media and generative AI—has driven an unprecedented surge in data generation and processing demands, positioning data centers as the operational core of today’s digital economy. As hyperscale infrastructures expand to meet this exponential growth, the memory subsystem has emerged as a critical performance and cost bottleneck. Modern server nodes face mounting pressure to sustain high-capacity, high-bandwidth, and low-latency memory access as workloads increasingly rely on large in-memory datasets and parallel execution across thousands of cores. However, traditional scaling approaches—such as adding DRAM channels or relying on remote memory through RDMA—face physical, technological, and economic limitations. The resulting “memory wall” manifests as three interdependent challenges: limited capacity, constrained bandwidth, and rising latency, all intensified by the slowdown of Moore’s Law and the end of Dennard scaling.
This thesis addresses these challenges through a holistic, cross-layer co-design approach that enhances memory system performance across three hierarchical levels—chip, server, and cluster. At the chip level, HinTM introduces compiler- and hardware-assisted mechanisms to mitigate capacity aborts in Hardware Transactional Memory systems, thereby improving on-chip cache utilization and parallel execution efficiency. At the server level, SURGE dynamically harvests idle I/O bandwidth over CXL links to augment effective memory bandwidth and reduce memory access latency in bandwidth-bound workloads. Extending to the cluster scale, COMET provides a unified design-space exploration framework that jointly optimizes compute, memory, and interconnect provisioning for distributed AI and HPC workloads. Collectively, these contributions demonstrate that co-optimizing architectural mechanisms with workload and hardware characteristics can overcome the fundamental limitations of memory capacity and bandwidth scaling, enabling sustained performance improvements across modern datacenter systems.
Sponsor
Date
2025-12
Extent
Resource Type
Text
Resource Subtype
Dissertation (PhD)