Near memory hardware accelerators for real-time radio frequency signal computation
Loading...
Author(s)
Mukherjee, Mandovi
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
Computation in real-time systems such as Radar, Process Control, Advanced Driver Assistance Systems (ADAS) present challenges of handling sample by sample processing, generating output responses at low and deterministic latency and enabling dynamic reconfigurability. Radio frequency real-time systems form particularly interesting examples of this, since the frequency of incoming samples is inherently very high, computations need to be completed with very low latency and they often tend to involve very large scale system configurations with a lot of data movement. Field Programmable Gate Array (FPGA) based accelerators have been designed to process RF computations in real-time systems, but they cannot simultaneously handle the high throughput (>200MHz) and low latency (of the order of µs) in wide bandwidth system, emphasizing the need for custom accelerators.
In-memory or near memory accelerators have shown promise for computation in hardware accelerators involving a large volume of data. Processing-in-Memory (PIM) has proved to be an elegant solution for accelerating Vector Matrix Multiplication (VMM), which forms the backbone of computation in traditional as well as RF Machine Learning. First, this thesis demonstrates a localized multifunctional control based processing-in-memory accelerator with support for VMM with flexible precision, floating point and complex numbers. The test-chip is fabricated in 65nm CMOS and shows a measured compute efficiency, normalized to memory size, of 34 GOPS/W/KB at 177MHz. The PIM accelerator with multifunctional control can enable in-memory radio frequency machine learning and linear algebraic signal processing and may be suitable for dense computation in real-time systems, but is restricted in terms of latency.
Second, the thesis proposes an ASIC based near-memory distributed control architecture for storage and distribution of real-time streaming digital RF data with simultaneous optimization of throughput and latency, specifically for sparse calculations. A small scale prototype design for the distributed control is fabricated in 28nm CMOS for application to a real-time RF emulator testbed with requirements of high throughput and deterministic, low latency. C++ based cycle level implementation of the proposed architecture, sparse Finite Impulse Response (FIR) filtering application analysis and measurement results from the test-chip in 28nm CMOS validate the proposed autonomous distributed control. Finally, the thesis considers a larger scale design of the proposed distributed control and discusses its end-to-end implementation in 28nm CMOS along with simulation results.
Sponsor
Date
2023-06-06
Extent
Resource Type
Text
Resource Subtype
Dissertation