Title:
ClusterWatch: Flexible, Lightweight Monitoring for High-end GPGPU Clusters

dc.contributor.author Slawinska, Magdalena
dc.contributor.author Schwan, Karsten
dc.contributor.author Eisenhauer, Greg
dc.contributor.corporatename Georgia Institute of Technology. Center for Experimental Research in Computer Systems en_US
dc.contributor.corporatename Georgia Institute of Technology. College of Computing en_US
dc.date.accessioned 2015-06-09T15:25:33Z
dc.date.available 2015-06-09T15:25:33Z
dc.date.issued 2013
dc.description.abstract The ClusterWatch middleware provides runtime flexibility in what system-level metrics are monitored, how frequently such monitoring is done, and how metrics are combined to obtain reliable information about the current behavior of GPGPU clusters. Interesting attributes of ClusterWatch are (1) the ease with which different metrics can be added to the system—by simply deploying additional “cluster spies,” (2) the ability to filter and process monitoring metrics at their sources, to reduce data movement overhead, (3) flexibility in the rate at which monitoring is done, (4) efficient movement of monitoring data into backend stores for long-term or historical analysis, and most importantly, (5) specific support for monitoring the behavior and use of the GPGPUs used by applications. This paper presents our initial experiences with using ClusterWatch to assess the performance behavior of the a larger-scale GPGPU-based simulation code. We report the overheads seen when using ClusterWatch, the experimental results obtained for the simulation, and the manner in which ClusterWatch will interact with infrastructures for detailed program performance monitoring and profiling such as TAU or Lynx. Experiments conducted on the NICS Keeneland Initial Delivery System (KIDS), with up to 64 nodes, demonstrate low monitoring overheads for high fidelity assessments of the simulation’s performance behavior, for both its CPU and GPU components. en_US
dc.embargo.terms null en_US
dc.identifier.uri http://hdl.handle.net/1853/53626
dc.language.iso en_US en_US
dc.publisher Georgia Institute of Technology en_US
dc.relation.ispartofseries CERCS ; GIT-CERCS-13-07 en_US
dc.subject Cluster spies en_US
dc.subject GPGPU en_US
dc.subject Parallel applications en_US
dc.subject System-level metrics en_US
dc.title ClusterWatch: Flexible, Lightweight Monitoring for High-end GPGPU Clusters en_US
dc.type Text
dc.type.genre Technical Report
dspace.entity.type Publication
local.contributor.author Schwan, Karsten
local.contributor.corporatename Center for Experimental Research in Computer Systems
local.relation.ispartofseries CERCS Technical Report Series
relation.isAuthorOfPublication a89a7e85-7f70-4eee-a49a-5090d7e88ce6
relation.isOrgUnitOfPublication 1dd858c0-be27-47fd-873d-208407cf0794
relation.isSeriesOfPublication bc21f6b3-4b86-4b92-8b66-d65d59e12c54
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
git-cercs-13-07.pdf
Size:
1.42 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
3.13 KB
Format:
Item-specific license agreed upon to submission
Description: