Analysis and Implementation of Dense On-Die Memories Using Traditional and Compute-in-Memory Approaches
Loading...
Author(s)
Spetalnick, Samuel D.
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
For processors and application accelerators executing data-intensive tasks, including machine learning inference, accessing data from dense memory arrays can be a limiting factor for energy efficiency and/or throughput (performance). The high cost of off-chip accesses has led to the extensive use of large, dense on-chip memories. The potentially large leakage energy and constrained density of traditional on-chip volatile memory (typically static random-access memory, SRAM) has encouraged exploration of a new generation of embedded memory technologies, including emerging logic-compatible embedded nonvolatile memory (eNVM) technologies such as resistive random-access memory (RRAM). The cost of accessing data has also motivated the re-emergence of the analog current-summing compute-in-memory (CIM) paradigm, where the stored states in multiple memory cells are scaled and added together inside the memory array so that the read-out value represents a multiply-accumulate (MAC) operation result. This potentially allows improved bandwidth and efficiency associated with data accesses, and reduced area and energy associated with digital MAC computation. The objective of the presented research is to use modeling and simulation, along with implemented test-chips, to investigate design challenges (non-idealities, performance, and density) for dense on-die memories with and without CIM. After a background discussion, this dissertation includes an analysis of current-summing CIM with SRAM that uses modeling and simulation to draw conclusions about CIM prospects. Next, two current-summing CIM macro implementations using RRAM are described. The first improves memory density over predecessor work by addressing physical design challenges while the second addresses CIM non-idealities: channel-to-channel offset and gain mismatch, IR drop, and off-state current. Measured results for both macros, including specific measurements for the non-idealities addressed by the second macro, are presented. The final part of this research is another implementation with RRAM. This system-level all-on-chip all-digital (non-CIM) inference accelerator work uses an end-to-end approach, with a tiled modular matrix unit, to improve access costs for eNVM while maintaining or improving nonvolatile memory density.
Sponsor
Date
2023-11-17
Extent
Resource Type
Text
Resource Subtype
Dissertation