Person:
Kim, Hyesoon

Associated Organization(s)
Organizational Unit
ORCID
ArchiveSpace Name Record

Publication Search Results

Now showing 1 - 3 of 3
  • Item
    HPerf: A Lightweight Profiler for Task Distribution on CPU+GPU Platforms
    (Georgia Institute of Technology, 2015) Lee, Joo Hwan ; Nigania, Nimit ; Kim, Hyesoon ; Brett, Bevin
    Heterogeneous computing has emerged as one of the major computing platforms in many domains. Although there have been several proposals to aid programming for heterogeneous computing platforms, optimizing applications on heterogeneous computing platforms is not an easy task. Identifying which parallel regions (or tasks) should run on GPUs or CPUs is one of the critical decisions to improve performance. In this paper, we propose a profiler, HPerf, to identify an efficient task distribution on CPUs+GPUs system with low profiling overhead. HPerf is a hierarchical profiler. First it performs lightweight profiling and then if necessary, it performs detailed profiling to measure caching and data transfer cost. Compared to a brute-force approach, HPerf reduces the profiling overhead significantly and compared to a naive decision, HPerf improves the performance of OpenCL applications up to 25%.
  • Item
    Qameleon: Hardware/software cooperative automated tuning for heterogeneous architectures
    (Georgia Institute of Technology, 2013-08) Kim, Hyesoon ; Vuduc, Richard
    The main goal of this project is to develop a framework that simplifies programming for heterogeneous platforms. The framework consists of (i) a runtime system to generate code that partitions and schedules work among heterogeneous processors, (ii) a general automated tuning mechanism based on machine learning and (iii) performance and power modeling techniques and profiling techniques to aid code generation.
  • Item
    SD³: A Scalable Approach to Dynamic Data-Dependence Profiling
    (Georgia Institute of Technology, 2011) Kim, Minjang ; Lakshminarayana, Nagesh B. ; Kim, Hyesoon ; Chi-Keung Luk,
    As multicore processors are deployed in mainstream computing, the need for software tools to help parallelize programs is increasing dramatically. Data-dependence profiling is an important technique to exploit parallelism in programs. More specifically, manual or automatic parallelization can use the outcomes of data-dependence profiling to guide where to parallelize in a program. However, state-of-the-art data-dependence profiling techniques are not scalable as they suffer from two major issues when profiling large and long-running applications: (1) runtime overhead and (2) memory overhead. Existing data-dependence profilers are either unable to profile large-scale applications or only report very limited information. In this paper, we propose a scalable approach to data-dependence profiling that addresses both runtime and memory overhead in a single framework. Our technique, called SD³, reduces the runtime overhead by parallelizing the dependence profiling step itself. To reduce the memory overhead, we compress memory accesses that exhibit stride patterns and compute data dependences directly in a compressed format. We demonstrate that SD³ reduces the runtime overhead when profiling SPEC 2006 by a factor of 4.1⨯ and 9.7⨯ on eight cores and 32 cores, respectively. For the memory overhead, we successfully profile SPEC 2006 with the reference input, while the previous approaches fail even with the train input. In some cases, we observe more than a 20⨯ improvement in memory consumption and a 16⨯ speedup in profiling time when 32 cores are used.