Understanding The Challenges with Using Hardware Pre-Fetchers for CPU-Based Matrix Multiply Units
Author(s)
Goldstein, Michael
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
Many advancements in the machine learning field have led to Graphics Processing Unit (GPU) being the de facto standard of large scale matrix computation accelerator. This work aims to answer questions about how difficult it is to use the traditional Central Processing Unit (CPU) with matrix accelerators and prefetching hardware integrated into the core instruction set for Artificial Intelligence (AI)- and Large Language Model (LLM)-like workloads. Specifically, this work focuses on the usage of stream prefetchers to optimize the performance of tile multiply instructions on Intel’s Sapphire Rapids Processors, referred to as the Advanced Matrix Extensions (AMX) instruction set.
Sponsor
Date
2024-08-26
Extent
Resource Type
Text
Resource Subtype
Thesis