Compiler Guided Scheduling : A Cross-Stack Approach For Performance Elicitation
Author(s)
Mururu, Girish
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
Modern software executes on multi-core systems that share resources like several levels of memory hierarchy (caches, main memory, secondary storage), I/O devices, and network interfaces. In such a co-execution environment, the performance of modern software is critically affected because of resource conflicts arising from sharing of these resources. The resource requirements vary not only across the processes but also during the execution of a process. Current resource management techniques involving OS schedulers have evolved from and mainly rely on the principles of fairness (achieved through time-multiplexing) and load-balancing and are oblivious to the dynamic resource requirements of individual processes. On the other hand, compiler research has traditionally evolved around optimizing single and multi-threaded programs limited to one process. However, compilers can analyze the process resource requirements. This thesis contends that a significant performance enhancement can be achieved through the compiler guidance of schedulers in terms of dynamic program characteristics and resource needs.
Towards compiler guided scheduling, we first look at the problem of process migration. For load-balancing purposes, OS schedulers such as CFS can migrate threads when they are in the middle of an intense memory reuse region thus destroying warmed up caches, TLBs. To solve this problem while providing enough flexibility for load-balancing, we propose PinIt, which first determines the regions of a program in which the process should be pinned onto a core so that adverse migrations causing excessive cache and TLB misses are avoided. The thesis proposes new measures such as unique memory reuse and memory reuse density, that capture the performance penalties incurred due to migration. Such regions with high penalties are encapsulated by the compiler with pin/unpin calls that prevent migrations. In an overloaded environment, compared to priority-cfs, PinIt speeds up high-priority applications in mediabench workloads by 1.16x and 2.12x and in computer vision-based workloads by 1.35x and 1.23x on 8 cores and 16 cores, respectively, with almost same or better throughput for low-priority applications.
The problem of co-scheduling and co-location of processes that share resources must be solved for efficiency in a co-execution environment. Towards this, several approaches proposed in the literature rely on static profile data or dynamic performance counter based information, which inherently cannot be used in an anticipatory (proactive) manner leading to suboptimal scheduling. This thesis proposes Beacons, a generic framework that instruments the programs with generated models or equations of specific characteristics of the program and provides a runtime counterpart that delivers the dynamically generated information to the scheduler. We develop a novel timing analysis for the duration of the loop that is on average 84% accurate on Polybench and Rodinia benchmarks and embed that along with memory footprint, and locality classification information into beacons. The thesis presents two schedulers, one that targets the problem of co-scheduling maximizing the throughput called Beacon Enabled Scheduler(BES), and the other that targets the problem of co-location minimizing the latency with fairness called Bellator. A prototype of BES improves throughput over the default Linux scheduler (CFS) by up to 4.7x on ThunderX and up to 5.2x on ThunderX2 servers for consolidated workloads. A prototype of Bellator on ThunderX2 with 224 hardware threads achieves lower 100th percentile latency by 14% on average while executing 108 and 162 simultaneous processes.
The thesis provides a preview of how beacons with cache-misses information can enable secure co-location of processes in a multi-tenant environment by detecting and mitigating cache-based side-channel attacks. Our beacon-based scheduler solution detects and mitigates attacks through all well-known cache-based side-channel techniques -- Prime+Probe, Flush+Reload, Flush+Flush-- on OpenSSL cryptography algorithms in multi-tenant environments.
Sponsor
Date
2020-08-19
Extent
Resource Type
Text
Resource Subtype
Dissertation