Organizational Unit:

School of Computational Science and Engineering

Permanent Link

https://hdl.handle.net/1853/70780

Parent Organization

Organizational Unit

College of Computing

ArchiveSpace Name Record

https://finding-aids.library.gatech.edu/agents/corporate_entities/1111

Full item page

Publication Search Results

Now showing 1 - 1 of 1

Data Tiling for Sparse Computation

(Georgia Institute of Technology, 2022-11-11) An, Xiaojing

Many real-world data contain internal relationships. Efficient analysis of these relationship data is crucial for important problems including genome alignment, network vulnerability analysis, ranking web pages, among others. Such relationship data is frequently sparse and analysis on it is called sparse computation. We demonstrate that the important technique of data tiling is more powerful than previously known by broadening its application space. We focus on three important sparse computation areas: graph analysis, linear algebra, and bioinformatics. We demonstrate data tiling's power by addressing key issues and providing significant improvements---to both runtime and solution quality---in each area. For graph analysis, we focus on fast data tiling techniques that can produce well-structured tiles and demonstrate theoretical hardness results. These tiles are suitable for graph problems as they reduce data movement and ultimately improve end-to-end runtime performance. For linear algebra, we introduce a new cache-aware tiling technique and apply it to the key kernel of sparse matrix by sparse matrix multiplication. This technique tiles the second input matrix and then uses a small, summary matrix to guide access to the tiles during computation. Our approach results in the fastest known implementation across three distinct CPU architectures. In bioinformatics, we develop a tiling based de novo genome assembly pipeline. We start with reads and develop either a graph or hypergraph that captures internal relationships between reads. This is then tiled to minimize connections while maintaining balance. We then treat each resulting tile independently as the input to an existing, shared-memory assembler. Our pipeline improves existing state-of-the-art de novo genome assemblers and brings both runtime and quality improvements to them on both real-world and simulated datasets.

Organizational Unit:

School of Computational Science and Engineering

Permanent Link

Research Organization Registry ID

Description

Previous Names

Parent Organization

Parent Organization

Includes Organization(s)

ArchiveSpace Name Record

Filters

Author

Advisor

Date

Organization

Resource Type

Resource Subtype

Has files

Record Type

Settings

Sort By

Results per page

Publication Search Results

Georgia Tech Library

Organizational Unit: School of Computational Science and Engineering

Permanent Link

Research Organization Registry ID

Description

Previous Names

Parent Organization

Parent Organization

Includes Organization(s)

ArchiveSpace Name Record

Filters

Author

Advisor

Date

Organization

Resource Type

Resource Subtype

Has files

Record Type

Settings

Sort By

Results per page

Publication Search Results

Organizational Unit:

School of Computational Science and Engineering