Title:
Fully Distributed Register Files for Heterogeneous Clustered Microarchitectures

dc.contributor.advisor Wills, D. Scott
dc.contributor.advisor Wills, Linda M.
dc.contributor.author Bunchua, Santithorn en_US
dc.contributor.committeeMember Blough, Douglas M.
dc.contributor.committeeMember Heck, Bonnie S.
dc.contributor.committeeMember Lee, Hsien-Hsin S.
dc.contributor.committeeMember Prvulovic, Milos
dc.contributor.department Electrical and Computer Engineering en_US
dc.date.accessioned 2005-03-02T22:21:46Z
dc.date.available 2005-03-02T22:21:46Z
dc.date.issued 2004-07-09 en_US
dc.description.abstract Conventional processor design utilizes a central register file and a bypass network to deliver operands to and from functional units, which cannot scale to a large number of functional units. As more functional units are integrated into a processor, the number of ports on a register file grows linearly while area, delay, and energy consumption grow even more rapidly. Physical properties of a bypass network scale in a similar manner. In this dissertation, a fully distributed register file organization is presented to overcome this limitation by relying on small register files with fewer ports and localized operand bypasses. Unlike other clustered microarchitectures, each cluster features a small single-issue functional unit coupled with a small local register file. Several clusters are used, and each of them can be different. All register files are connected through a register transfer network that supports multicast communications. Techniques to support distributed register file operations are presented for both dynamically and statically scheduled processors. These include the eager and multicast register transfer mechanisms in the dynamic approach and the global data routing with multicasting algorithm in the static approach. Although this organizaiton requires additional cycles to execute a program, it is compensated by significant savings obtained through smaller area, faster operand access time, and lower energy consumption. With faster operating frequency and more efficient hardware implementation, overall performance can be improved. Additionally, the fully distributed register file organization is applied to an ILP-SIMD processing element, which is the major building block of a massively parallel media processor array. The results show reduction in die area, which can be utilized to implement additional processing elements. Consequently, performance is improved through a higher degree of data parallelism through a larger processor array. In summary, the fully distributed register file architecture permits future processors to scale to a large number of functional units. This is especially desirable in high-throughput processors such as wide-issue processors and multithreaded processors. Moreover, localized communication is highly desirable in the transition to future deep submicron technologies since long wire is a critical issue in processes with extremely small feature sizes. en_US
dc.description.degree Ph.D. en_US
dc.format.extent 809442 bytes
dc.format.mimetype application/pdf
dc.identifier.uri http://hdl.handle.net/1853/5041
dc.language.iso en_US
dc.publisher Georgia Institute of Technology en_US
dc.subject Clustered microarchitecture en_US
dc.subject Distributed register file
dc.title Fully Distributed Register Files for Heterogeneous Clustered Microarchitectures en_US
dc.type Text
dc.type.genre Dissertation
dspace.entity.type Publication
local.contributor.advisor Wills, Linda M.
local.contributor.corporatename School of Electrical and Computer Engineering
local.contributor.corporatename College of Engineering
relation.isAdvisorOfPublication c965b932-6dbb-46d3-8e30-6d7809f2f9b6
relation.isOrgUnitOfPublication 5b7adef2-447c-4270-b9fc-846bd76f80f2
relation.isOrgUnitOfPublication 7c022d60-21d5-497c-b552-95e489a06569
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
bunchua_santithorn_200407_phd.pdf
Size:
790.47 KB
Format:
Adobe Portable Document Format
Description: