Understanding The Challenges with Using Hardware Pre-Fetchers for CPU-Based Matrix Multiply Units

Author(s)
Goldstein, Michael
Advisor(s)
Editor(s)
Associated Organization(s)
Organizational Unit
Organizational Unit
School of Computer Science
School established in 2007
Supplementary to:
Abstract
Many advancements in the machine learning field have led to Graphics Processing Unit (GPU) being the de facto standard of large scale matrix computation accelerator. This work aims to answer questions about how difficult it is to use the traditional Central Processing Unit (CPU) with matrix accelerators and prefetching hardware integrated into the core instruction set for Artificial Intelligence (AI)- and Large Language Model (LLM)-like workloads. Specifically, this work focuses on the usage of stream prefetchers to optimize the performance of tile multiply instructions on Intel’s Sapphire Rapids Processors, referred to as the Advanced Matrix Extensions (AMX) instruction set.
Sponsor
Date
2024-08-26
Extent
Resource Type
Text
Resource Subtype
Thesis
Rights Statement
Rights URI