Title:
Efficient resource sharing for big data applications in shared clusters

dc.contributor.advisor Pu, Calton
dc.contributor.author Li, Jack
dc.contributor.committeeMember Liu, Lin
dc.contributor.committeeMember Navathe, Shamkant B.
dc.contributor.committeeMember Omiecinski, Edward R.
dc.contributor.committeeMember Wang, Qingyang
dc.contributor.department Computer Science
dc.date.accessioned 2016-08-22T12:22:27Z
dc.date.available 2016-08-22T12:22:27Z
dc.date.created 2016-08
dc.date.issued 2016-05-20
dc.date.submitted August 2016
dc.date.updated 2016-08-22T12:22:27Z
dc.description.abstract Modern data centers are shifting to shared clusters where the resources are shared among multiple users and frameworks. A key enabler for such shared clusters is a cluster resource management system which allocates resources among different frameworks. One key problem in these shared clusters is how to efficiently share cluster resources between multiple applications and users in an elastic and non-disruptive manner. Current cluster schedulers typically utilize kill-based preemption to coordinate resource sharing, achieve fairness and satisfy SLOs during resource contention by simply killing low priority jobs and restarting them later when resources are available. This simple preemption policy ensures fast service times of high priority jobs and prevents a single user/application from occupying too many resources and starving others; however, without saving the progress of preempted jobs, this policy causes significant resource waste and delays the response time of long running or low priority jobs. The issue of dynamic resource sharing becomes even more problematic when there are different types of applications running on the same cluster (e.g., batch processing systems running alongside real-time streaming systems). Different application types will often have varying quality of service metrics (e.g., higher throughput versus lower latency) which can make resource sharing among these applications contentious. In this dissertation, we show the impact of kill-based preemption in modern shared clusters and propose two solutions to more efficiently share resources in shared cluster environments by utilizing checkpoint-based preemption and supporting elasticity in distributed data stream processing systems.
dc.description.degree Ph.D.
dc.format.mimetype application/pdf
dc.identifier.uri http://hdl.handle.net/1853/55597
dc.language.iso en_US
dc.publisher Georgia Institute of Technology
dc.subject Shared clusters
dc.subject Resource management
dc.subject Cloud
dc.subject Preemption
dc.subject Scheduling
dc.subject Multi-tenancy
dc.subject Distributed stream processing
dc.subject Elasticity
dc.title Efficient resource sharing for big data applications in shared clusters
dc.type Text
dc.type.genre Dissertation
dspace.entity.type Publication
local.contributor.advisor Pu, Calton
local.contributor.corporatename College of Computing
local.contributor.corporatename School of Computer Science
relation.isAdvisorOfPublication fc48a3de-da43-4d32-af59-414047eb7cd7
relation.isOrgUnitOfPublication c8892b3c-8db6-4b7b-a33a-1b67f7db2021
relation.isOrgUnitOfPublication 6b42174a-e0e1-40e3-a581-47bed0470a1e
thesis.degree.level Doctoral
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
LI-DISSERTATION-2016.pdf
Size:
1.28 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
3.86 KB
Format:
Plain Text
Description: