Cryogenic CMOS Circuits for High Performance Digital Systems

Author(s)
Saligram, Rakshith
Editor(s)
Associated Organization(s)
Series
Supplementary to:
Abstract
There has been an ever increasing demand for energy efficient processors, more recently so with the emergence of Artificial Intelligence, Machine Learning and Large Language Models. Cryogenic computing is a transformative technology that uses ultra-low temperatures (-196°C/77K) to achieve higher performance and/or better energy efficiency. The superior device characteristics like higher device drive current, lower subthreshold slope, ultra-low subthreshold leakage, lower interconnect resistance etc., opens multitude of design opportunities both at circuits and system level. It also enables memory technologies which are otherwise lost due to technological evolution. In this work, we show how the better device and interconnect properties translate to faster, smaller and lower power systems. To do so, we build in-house cryogenic device models well calibrated to experimentally measured data for both transistors and wires. We use circuit concepts to intricately measure and calibrate the interconnect resistance based on a standard foundry process chip tapeout in 22nm FDSOI. The models are robust, scalable, based on industry standard platforms and aid in cryogenic circuit simulation to design higher order systems. We characterize a matrix multiplication accelerator test chip across temperature built in 40nm CMOS process and demonstrate an energy efficiency improvement of up to 26%. We propose new biasing techniques for dynamic logic circuits which when benchmarked on a 64-bit domino logic adder proves to consume 41% lesser energy than the room temperature counterpart. We also co-optimize the design of 6T SRAM with technology to allow for supply voltage scaling in the presence of variation while providing 5.4× lower energy and 1.2× lower delay. We further demonstrate a 28nm hybrid 2T gain cell embedded DRAM test chip capable of operating from 4K to 300K. The memory macro shows 1.7× energy higher energy efficiency, more than 106 × higher retention time and lower refresh rate at low temperature. Finally, we present a design technology co-optimized benchmarking of a 64-bit Arm processor. More than 12 different standard cell libraries are recharacterized at multiple temperatures and supply voltages to execute full fledged auto-place and route runs. The designs are then analyzed for power, performance and area, results of which show more than 4× improvement in energy efficiency at low temperature. We also benchmark the thermal behavior of the system and show that cryogenic computing can theoretically allow higher number of chips to be packaged together at a given thermal design power limit. Lastly, the cooling cost is analyzed, and key roadblocks are identified with possible future works.
Sponsor
Date
2024-07-17
Extent
Resource Type
Text
Resource Subtype
Dissertation
Rights Statement
Rights URI