Person:
Kim, Hyesoon

Associated Organization(s)
Organizational Unit
ORCID
ArchiveSpace Name Record

Publication Search Results

Now showing 1 - 5 of 5
Thumbnail Image
Item

Power Modeling for GPU Architecture Using McPAT

2013 , Lim, Jieun , Lakshminarayana, Nagesh B. , Kim, Hyesoon , Song, William , Yalamanchili, Sudhakar , Sung, Wonyong

Graphics Processing Units (GPUs) are very popular for both graphics and general-purpose applications. Since GPUs operate many processing units and manage multiple levels of memory hierarchy, they consume a significant amount of power. Although several power models for CPUs are available, the power consumption of GPUs has not been studied much yet. In this paper, we develop a new power model for GPUs by utilizing McPAT, a CPU power tool. We generate initial power model data from McPAT with a detailed GPU configuration, and then adjust the models by comparing them with empirical data.We use the NVIDIA’s Fermi architecture for building the power model, and our model estimates the GPU power consumption with an average error of 7.7% and 12.8% for the microbenchmarks and Merge benchmarks, respectively.

Thumbnail Image
Item

Evaluating Scalability of Multi-threaded Applications on a Many-core Platform

2012 , Gupta, Vishal , Kim, Hyesoon , Schwan, Karsten

Multicore processors have been effective in scaling application performance by dividing computation among multiple threads running in parallel. However, application performance does not necessarily improve as more cores are added. Application performance can be limited due to multiple bottlenecks including contention for shared resources such as caches and memory. In this paper, we perform a scalability analysis of parallel applications on a 64-threaded Intel Nehalem-EX based system. We find that applications which scale well on small number of cores, exhibit poor scalability on large number of cores. Using hardware performance counters, we show that many performance limited applications are limited by memory bandwidth on manycore platforms and exhibit improved scalability when provisioned with higher memory bandwidth. By regulating the number of threads used and applying dynamic voltage and frequency scaling for memory bandwidth limited benchmarks, significant energy savings can be achieved.

Thumbnail Image
Item

The AM-Bench: An Android Multimedia Benchmark Suite

2012 , Lee, Chayong , Kim, Euna , Kim, Hyesoon

Despite the significant evolution of mobile devices and the increased use of mobile devices, not many mobile benchmarks have been studied. Even though mobile applications share similar characteristics with traditional desktop oriented applications, different programming environments and user usage patterns present different characteristics. In this paper, we introduce an open source based mobile multimedia benchmark for Android platforms (AM-Bench). The AM-Bench consists of several multimedia benchmarks running on Android platforms. We explain the characteristics of the AM-Bench and compare performance on four Android-based platforms.

Thumbnail Image
Item

A New Temperature Distribution Measurement Method on GPU Architectures Using Thermocouples

2012 , Dasgupta, Aniruddha , Hong, Sunpyo , Kim, Hyesoon , Park, Jinil

In recent years, the many-core architecture has seen a rapid increase in the number of on-chip cores with a much slower increase in die area. This has led to very high power densities in the chip. Hence, in addition to power, temperature has become a first-order design constraint for high-performance architectures. However, measuring temperature is very limited to on-chip temperature sensors, which might not always be available to researchers. In this paper, we propose a new temperature-measurement system using thermocouples for many-core GPU architectures and devise a new method to control GPU scheduling. This system gives us a temperature distribution heatmap of the chip. In addition to monitoring temperature distribution, our system also does run-time power consumption monitoring. The results show that there is a strong corelation between the on-chip heatmap patterns and power consumption. Furthermore, we provide actual experimental results that show the relationship between TPC utilizations and their active locations that reduce temperature and power consumption.

Thumbnail Image
Item

Design Space Exploration of On-chip Ring Interconnection for a CPU-GPU Architecture

2012 , Lee, Jaekyu , Li, Si , Kim, Hyesoon , Yalamanchili, Sudhakar

Future chip multiprocessors (CMP) will only grow in core count and diversity in terms of frequency, power consumption, and resource distribution. Incorporating a GPU architecture into CMP, which is more efficient with certain types of applications, is the next stage in this trend. This heterogeneous mix of architectures will use an on-chip interconnection to access shared resources such as last-level cache tiles and memory controllers. The configuration of this on-chip network will likely have a significant impact on resource distribution, fairness, and overall performance. The heterogeneity of this architecture inevitably exerts different pressures on the interconnection due to the differing characteristics and requirements of applications running on CPU and GPU cores. CPU applications are sensitive to latency, while GPGPU applications require massive bandwidth. This is due to the difference in the thread-level parallelism of the two architectures. GPUs use more threads to hide the effect of memory latency but require massive bandwidth to supply those threads. On the other hand, CPU cores typically running only one or two threads concurrently are very sensitive to latency. This study surveys the impact and behavior of the interconnection network when CPU and GPGPU applications run simultaneously. This will shed light on other architectural interconnection studies on CPU-GPU heterogeneous architectures.