Title:
PreDatA - Preparatory Data Analytics on Peta-Scale Machines

dc.contributor.author Zheng, Fang
dc.contributor.author Abbasi, Hasan
dc.contributor.author Docan, Ciprian
dc.contributor.author Lofstead, Jay
dc.contributor.author Klasky, Scott
dc.contributor.author Liu, Qing
dc.contributor.author Parashar, Manish
dc.contributor.author Podhorszki, Norbert
dc.contributor.author Schwan, Karsten
dc.contributor.author Wolf, Matthew
dc.contributor.corporatename Georgia Institute of Technology. College of Computing
dc.contributor.corporatename Georgia Institute of Technology. Center for Experimental Research in Computer Systems
dc.contributor.corporatename Rutgers University. Center for Autonomic Computing
dc.contributor.corporatename Oak Ridge National Laboratory
dc.date.accessioned 2011-01-21T21:39:22Z
dc.date.available 2011-01-21T21:39:22Z
dc.date.issued 2010
dc.description.abstract Peta-scale scientific applications running on High End Computing (HEC) platforms can generate large volumes of data. For high performance storage and in order to be useful to science end users, such data must be organized in its layout, indexed, sorted, and otherwise manipulated for subsequent data presentation, visualization, and detailed analysis. In addition, scientists desire to gain insights into selected data characteristics ‘hidden’ or ‘latent’ in the massive datasets while data is being produced by simulations. PreDatA, short for Preparatory Data Analytics, is an approach for preparing and characterizing data while it is being produced by the large scale simulations running on peta-scale machines. By dedicating additional compute nodes on the peta-scale machine as staging nodes and staging simulation’s output data through these nodes, PreDatA can exploit their computational power to perform selected data manipulations with lower latency than attainable by first moving data into file systems and storage. Such in-transit manipulations are supported by the PreDatA middleware through RDMAbased data movement to reduce write latency, application-specific operations on streaming data that are able to discover latent data characteristics, and appropriate data reorganization and metadata annotation to speed up subsequent data access. As a result, PreDatA enhances the scalability and flexibility of current I/O stack on HEC platforms and is useful for data pre-processing, runtime data analysis and inspection, as well as for data exchange between concurrently running simulation models. Performance evaluations with several production peta-scale applications on Oak Ridge National Laboratory’s Leadership Computing Facility demonstrate the feasibility and advantages of the PreDatA approach. en_US
dc.identifier.uri http://hdl.handle.net/1853/36670
dc.language.iso en_US en_US
dc.publisher Georgia Institute of Technology en_US
dc.relation.ispartofseries CERCS ; GIT-CERCS-10-01 en_US
dc.subject Global data knowledge en_US
dc.subject High end computing en_US
dc.subject High performance en_US
dc.subject Peta-scale en_US
dc.subject PreDatA en_US
dc.subject Preparatory Data Analytics en_US
dc.title PreDatA - Preparatory Data Analytics on Peta-Scale Machines en_US
dc.type Text
dc.type.genre Technical Report
dspace.entity.type Publication
local.contributor.author Schwan, Karsten
local.contributor.corporatename Center for Experimental Research in Computer Systems
local.relation.ispartofseries CERCS Technical Report Series
relation.isAuthorOfPublication a89a7e85-7f70-4eee-a49a-5090d7e88ce6
relation.isOrgUnitOfPublication 1dd858c0-be27-47fd-873d-208407cf0794
relation.isSeriesOfPublication bc21f6b3-4b86-4b92-8b66-d65d59e12c54
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
git-cercs-10-01.pdf
Size:
508.75 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.76 KB
Format:
Item-specific license agreed upon to submission
Description: