Secure and high-performance big-data systems in the cloud

Tang, Yuzhe

Title:

Secure and high-performance big-data systems in the cloud

dc.contributor.advisor	Liu, Ling
dc.contributor.author	Tang, Yuzhe
dc.contributor.committeeMember	Ahamad, Mustaque
dc.contributor.committeeMember	Blough, Doug
dc.contributor.committeeMember	Omiecinski, Edward
dc.contributor.committeeMember	Pu, Calton
dc.contributor.department	Computer Science
dc.date.accessioned	2015-09-21T15:52:12Z
dc.date.available	2015-09-22T05:30:06Z
dc.date.created	2014-08
dc.date.issued	2014-05-16
dc.date.submitted	August 2014
dc.date.updated	2015-09-21T15:52:12Z
dc.description.abstract	Cloud computing and big data technology continue to revolutionize how computing and data analysis are delivered today and in the future. To store and process the fast-changing big data, various scalable systems (e.g. key-value stores and MapReduce) have recently emerged in industry. However, there is a huge gap between what these open-source software systems can offer and what the real-world applications demand. First, scalable key-value stores are designed for simple data access methods, which limit their use in advanced database applications. Second, existing systems in the cloud need automatic performance optimization for better resource management with minimized operational overhead. Third, the demand continues to grow for privacy-preserving search and information sharing between autonomous data providers, as exemplified by the Healthcare information networks. My Ph.D. research aims at bridging these gaps. First, I proposed HINDEX, for secondary index support on top of write-optimized key-value stores (e.g. HBase and Cassandra). To update the index structure efficiently in the face of an intensive write stream, HINDEX synchronously executes append-only operations and defers the so-called index-repair operations which are expensive. The core contribution of HINDEX is a scheduling framework for deferred and lightweight execution of index repairs. HINDEX has been implemented and is currently being transferred to an IBM big data product. Second, I proposed Auto-pipelining for automatic performance optimization of streaming applications on multi-core machines. The goal is to prevent the bottleneck scenario in which the streaming system is blocked by a single core while all other cores are idling, which wastes resources. To partition the streaming workload evenly to all the cores and to search for the best partitioning among many possibilities, I proposed a heuristic based search strategy that achieves locally optimal partitioning with lightweight search overhead. The key idea is to use a white-box approach to search for the theoretically best partitioning and then use a black-box approach to verify the effectiveness of such partitioning. The proposed technique, called Auto-pipelining, is implemented on IBM Stream S. Third, I proposed ǫ-PPI, a suite of privacy preserving index algorithms that allow data sharing among unknown parties and yet maintaining a desired level of data privacy. To differentiate privacy concerns of different persons, I proposed a personalized privacy definition and substantiated this new privacy requirement by the injection of false positives in the published ǫ-PPI data. To construct the ǫ-PPI securely and efficiently, I proposed to optimize the performance of multi-party computations which are otherwise expensive; the key idea is to use addition-homomorphic secret sharing mechanism which is inexpensive and to do the distributed computation in a scalable P2P overlay.
dc.description.degree	Ph.D.
dc.embargo.terms	2015-08-01
dc.format.mimetype	application/pdf
dc.identifier.uri	http://hdl.handle.net/1853/53995
dc.language.iso	en_US
dc.publisher	Georgia Institute of Technology
dc.subject	Cloud
dc.subject	Big-data
dc.subject	Security
dc.subject	Efficiency
dc.subject	Performance
dc.subject	Streaming
dc.subject	Multi-core
dc.subject	Index
dc.subject	Key-value stores
dc.subject	Privacy preserving
dc.subject	Performance optimization
dc.subject	Log-structured systems
dc.title	Secure and high-performance big-data systems in the cloud
dc.type	Text
dc.type.genre	Dissertation
dspace.entity.type	Publication
local.contributor.advisor	Liu, Ling
local.contributor.corporatename	College of Computing
relation.isAdvisorOfPublication	96391b98-ac42-4e2c-93ee-79a5e16c2dfb
relation.isOrgUnitOfPublication	c8892b3c-8db6-4b7b-a33a-1b67f7db2021
thesis.degree.level	Doctoral