Tuned and asynchronous stencil kernels for CPU/GPU systems

Venkatasubramanian, Sundaresan

Title:

Tuned and asynchronous stencil kernels for CPU/GPU systems

Files

venkatasubramanian_sundaresan_200908_mast.pdf (1.56 MB)

Author(s)

Venkatasubramanian, Sundaresan

Advisor(s)

Vuduc, Richard

Advisor(s)

Person

Vuduc, Richard

Associated Organization(s)

Organizational Unit

College of Computing

Abstract

We describe heterogeneous multi-CPU and multi-GPU implementations of Jacobi's iterative method for the 2-D Poisson equation on a structured grid, in both single- and double-precision. Properly tuned, our best implementation achieves 98% of the empirical streaming GPU bandwidth (66% of peak) on a NVIDIA C1060. Motivated to find a still faster implementation, we further consider "wildly asynchronous" implementations that can reduce or even eliminate the synchronization bottleneck between iterations. In these versions, which are based on the principle of a chaotic relaxation (Chazan and Miranker, 1969), we simply remove or delay synchronization between iterations, thereby potentially trading off more flops (via more iterations to converge) for a higher degree of asynchronous parallelism. Our relaxed-synchronization implementations on a GPU can be 1.2-2.5x faster than our best synchronized GPU implementation while achieving the same accuracy. Looking forward, this result suggests research on similarly "fast-and-loose" algorithms in the coming era of increasingly massive concurrency and relatively high synchronization or communication costs.

Date Issued

2009-05-18

Resource Type

Text

Resource Subtype

Thesis

Full item page

Title:

Tuned and asynchronous stencil kernels for CPU/GPU systems

Files

Author(s)

Authors

Advisor(s)

Advisor(s)

Editor(s)

Associated Organization(s)

Series

Collections

Supplementary to

Permanent Link

Abstract

Sponsor

Date Issued

Extent

Resource Type

Resource Subtype

Rights Statement

Rights URI

Georgia Tech Library

Title: Tuned and asynchronous stencil kernels for CPU/GPU systems

Files

Author(s)

Authors

Advisor(s)

Advisor(s)

Editor(s)

Associated Organization(s)

Series

Collections

Supplementary to

Permanent Link

Abstract

Sponsor

Date Issued

Extent

Resource Type

Resource Subtype

Rights Statement

Rights URI

Title:

Tuned and asynchronous stencil kernels for CPU/GPU systems