Title:
PreDatA - Preparatory Data Analytics on Peta-Scale Machines
PreDatA - Preparatory Data Analytics on Peta-Scale Machines
dc.contributor.author | Zheng, Fang | |
dc.contributor.author | Abbasi, Hasan | |
dc.contributor.author | Docan, Ciprian | |
dc.contributor.author | Lofstead, Jay | |
dc.contributor.author | Klasky, Scott | |
dc.contributor.author | Liu, Qing | |
dc.contributor.author | Parashar, Manish | |
dc.contributor.author | Podhorszki, Norbert | |
dc.contributor.author | Schwan, Karsten | |
dc.contributor.author | Wolf, Matthew | |
dc.contributor.corporatename | Georgia Institute of Technology. College of Computing | |
dc.contributor.corporatename | Georgia Institute of Technology. Center for Experimental Research in Computer Systems | |
dc.contributor.corporatename | Rutgers University. Center for Autonomic Computing | |
dc.contributor.corporatename | Oak Ridge National Laboratory | |
dc.date.accessioned | 2011-01-21T21:39:22Z | |
dc.date.available | 2011-01-21T21:39:22Z | |
dc.date.issued | 2010 | |
dc.description.abstract | Peta-scale scientific applications running on High End Computing (HEC) platforms can generate large volumes of data. For high performance storage and in order to be useful to science end users, such data must be organized in its layout, indexed, sorted, and otherwise manipulated for subsequent data presentation, visualization, and detailed analysis. In addition, scientists desire to gain insights into selected data characteristics ‘hidden’ or ‘latent’ in the massive datasets while data is being produced by simulations. PreDatA, short for Preparatory Data Analytics, is an approach for preparing and characterizing data while it is being produced by the large scale simulations running on peta-scale machines. By dedicating additional compute nodes on the peta-scale machine as staging nodes and staging simulation’s output data through these nodes, PreDatA can exploit their computational power to perform selected data manipulations with lower latency than attainable by first moving data into file systems and storage. Such in-transit manipulations are supported by the PreDatA middleware through RDMAbased data movement to reduce write latency, application-specific operations on streaming data that are able to discover latent data characteristics, and appropriate data reorganization and metadata annotation to speed up subsequent data access. As a result, PreDatA enhances the scalability and flexibility of current I/O stack on HEC platforms and is useful for data pre-processing, runtime data analysis and inspection, as well as for data exchange between concurrently running simulation models. Performance evaluations with several production peta-scale applications on Oak Ridge National Laboratory’s Leadership Computing Facility demonstrate the feasibility and advantages of the PreDatA approach. | en_US |
dc.identifier.uri | http://hdl.handle.net/1853/36670 | |
dc.language.iso | en_US | en_US |
dc.publisher | Georgia Institute of Technology | en_US |
dc.relation.ispartofseries | CERCS ; GIT-CERCS-10-01 | en_US |
dc.subject | Global data knowledge | en_US |
dc.subject | High end computing | en_US |
dc.subject | High performance | en_US |
dc.subject | Peta-scale | en_US |
dc.subject | PreDatA | en_US |
dc.subject | Preparatory Data Analytics | en_US |
dc.title | PreDatA - Preparatory Data Analytics on Peta-Scale Machines | en_US |
dc.type | Text | |
dc.type.genre | Technical Report | |
dspace.entity.type | Publication | |
local.contributor.author | Schwan, Karsten | |
local.contributor.corporatename | Center for Experimental Research in Computer Systems | |
local.relation.ispartofseries | CERCS Technical Report Series | |
relation.isAuthorOfPublication | a89a7e85-7f70-4eee-a49a-5090d7e88ce6 | |
relation.isOrgUnitOfPublication | 1dd858c0-be27-47fd-873d-208407cf0794 | |
relation.isSeriesOfPublication | bc21f6b3-4b86-4b92-8b66-d65d59e12c54 |