Title:
Accelerated deep learning for the edge-to-cloud continuum: A specialized full stack derived from algorithms

dc.contributor.advisor Esmaeilzadeh, Hadi
dc.contributor.advisor Kim, Hyesoon
dc.contributor.advisor Prvulovic, Milos
dc.contributor.advisor Krishna, Tushar
dc.contributor.advisor Chandra, Vikas
dc.contributor.author Sharma, Hardik
dc.contributor.department Electrical and Computer Engineering
dc.date.accessioned 2019-05-29T14:03:31Z
dc.date.available 2019-05-29T14:03:31Z
dc.date.created 2019-05
dc.date.issued 2019-03-29
dc.date.submitted May 2019
dc.date.updated 2019-05-29T14:03:31Z
dc.description.abstract Advances in high-performance computer architecture design have been a major driver for the rapid evolution of Deep Neural Networks (DNN). Due to their insatiable demand for compute power, naturally, both the research community as well the industry have turned to accelerators to accommodate modern DNN computation. Furthermore, DNNs are gaining prevalence and have found applications across a wide spectrum of devices, from commod- ity smartphones to enterprise cloud platforms. However, there is no one-size-fits-all solu- tion for this continuum of devices that can meet the strict energy/power/chip-area budgets for edge devices and meet the high performance requirements for enterprise-grade servers. To this end, this thesis designs a specialized compute stack for DNN acceleration across the edge-to-cloud continuum that flexibly matches the varying constraints for different devices and simultaneously exploits algorithmic properties to maximize the benefits from acceleration. To this end, this thesis first explores a tight integration of Neural Network (NN) accelerators within the massively-parallel GPUs with a minimal area overhead. We show that a tight-coupling of NN-accelerators and GPUs can provide a significant gain in performance and energy efficiency across a diverse set of applications through neural acceleration, by approximating regions of approximation- amenable code using a neural networks. Next, this thesis develops a full-stack for accelerating DNN inference on FPGAs that aims to provide programmability, performance, and efficiency. We call our specialized compute stack DNNWEAVER, which encompasses (1) high-level algorithmic abstractions, (2) a flexible template accelerator architecture, and (3) a compiler that automatically and efficiently optimizes the template architecture to maximize DNN performance using the limited resources available on the FPGA die. The third thrust of this thesis explores scale-out acceleration of training using cloud-scale FPGAs for a wide range of machine learning algorithms, including neural networks. The challenge here is to design an accelerator architecture that can scale up to efficiently use the large pool of compute resources available on modern cloud-grade FPGAs. To tackle this challenge, this thesis explores multi-threading to maximize efficiency from FPGA acceleration by running multiple parallel threads of training. The final thrust of this thesis builds upon the algorithmic insight that bitwidth of operations in DNNs can be reduced without compromising their classification accuracy. However, to prevent loss of accuracy, the bitwidth varies significantly across DNNs and it may even be adjusted for each layer individually. Thus, a fixed-bitwidth accelerator would either offer limited benefits to accommodate the worst-case bitwidth requirements, or inevitably lead to a degradation in final accuracy. To alleviate these deficiencies, the final thrust of this thesis introduces dynamic bit-level fusion/decomposition as a new dimension in the design of DNN accelerators. The final thrust of this thesis explores mixed-signal acceleration to push accelerator efficiency to its limits. As such, the final thrust explores executing the low-bitwidth multiply- add operations prevalent in DNNs in the analog domain to gain significant efficiency ben- efits. Using low-bitwdith analog compute units enables us to overcome the limited range for information encoding, susceptibility to noise, and Analog to Digital (A/D) conversion overheads.
dc.description.degree Ph.D.
dc.format.mimetype application/pdf
dc.identifier.uri http://hdl.handle.net/1853/61267
dc.language.iso en_US
dc.publisher Georgia Institute of Technology
dc.subject Bit level composability
dc.subject Dynamic composability
dc.subject Deep neural networks
dc.subject Accelerators
dc.subject DNN
dc.subject Convolutional neural networks
dc.subject CNN
dc.subject Long short-term memory
dc.subject LSTM
dc.subject Recurrent neural networks
dc.subject RNN
dc.subject Quantization
dc.subject Bit fusion
dc.subject DnnWeaver
dc.subject FPGA
dc.subject ASIC
dc.title Accelerated deep learning for the edge-to-cloud continuum: A specialized full stack derived from algorithms
dc.type Text
dc.type.genre Dissertation
dspace.entity.type Publication
local.contributor.advisor Prvulovic, Milos
local.contributor.advisor Kim, Hyesoon
local.contributor.advisor Krishna, Tushar
local.contributor.corporatename School of Electrical and Computer Engineering
local.contributor.corporatename College of Engineering
relation.isAdvisorOfPublication 2d678067-bb81-43c7-be94-bd87bced736e
relation.isAdvisorOfPublication ec222ec7-e853-445c-b356-51b942d36799
relation.isAdvisorOfPublication f80c3b14-cd42-456d-b440-addf20372fbc
relation.isOrgUnitOfPublication 5b7adef2-447c-4270-b9fc-846bd76f80f2
relation.isOrgUnitOfPublication 7c022d60-21d5-497c-b552-95e489a06569
thesis.degree.level Doctoral
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
SHARMA-DISSERTATION-2019.pdf
Size:
3.87 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
3.87 KB
Format:
Plain Text
Description: