Improving Scalability of Deep Learning Accelerators Using 3D IC

Loading...
Thumbnail Image
Author(s)
Zhang, Canlin
Advisor(s)
Editor(s)
Associated Organization(s)
Supplementary to:
Abstract
Our work investigates and addresses the critical scalability challenges that flexible Deep-Learning (DL) accelerators face in both interconnect and memory system design. As DL models grow increasingly diverse in their shapes, operators, and data reuse patterns, rigid architectures such as systolic arrays struggle to maintain high utilization, leading to inefficient execution on emerging workloads. Flexible accelerators, such as SARA and MAERI, overcome these utilization bottlenecks through reconfigurable compute arrays and adaptable dataflows. However, this flexibility introduces substantial physical design overhead, particularly in wirelength, area, power, and critical path timing. To address these bottlenecks, we propose architectural and physical design methodologies that leverage 3D integration to improve the scalability of flexible DL accelerators. First, we tackle the scalability limitations of flexible interconnects by identifying key architectural bottlenecks—namely, topology complexity and switch logic depth and introducing Logic-on-Logic 3D partitioning techniques. Our methods yield up to 3x improvement in timing, over 75% throughput gains, and reduced energy costs compared to 2D flexible designs. Second, we analyze memory system and datapath scalability issues specific to hierarchical accelerators like SARA. These architectures suffer from excessive memory port proliferation and datapath congestion as PE count increases. To address this, we apply 3D Memory-on-Logic integration and improved macro placement strategies. Our design demonstrates up to 1.24x runtime speedup, 1.4x improvement in EDP, and 1.3x reduction in area compared to traditional 2D design baselines. These contributions validate our work's central argument: 3D integration is a powerful enabler of scalable, high-performance, and energy-efficient flexible DL accelerators. By co-optimizing architecture and physical design and integrating realistic simulation data and PPA metrics, this work presents a practical and future-facing blueprint for accelerator scaling in the post-systolic era.
Sponsor
Date
2025-04-30
Extent
Resource Type
Text
Resource Subtype
Thesis
Rights Statement
Rights URI