Title:
Secure and high-performance big-data systems in the cloud

dc.contributor.advisor Liu, Ling
dc.contributor.author Tang, Yuzhe
dc.contributor.committeeMember Ahamad, Mustaque
dc.contributor.committeeMember Blough, Doug
dc.contributor.committeeMember Omiecinski, Edward
dc.contributor.committeeMember Pu, Calton
dc.contributor.department Computer Science
dc.date.accessioned 2015-09-21T15:52:12Z
dc.date.available 2015-09-22T05:30:06Z
dc.date.created 2014-08
dc.date.issued 2014-05-16
dc.date.submitted August 2014
dc.date.updated 2015-09-21T15:52:12Z
dc.description.abstract Cloud computing and big data technology continue to revolutionize how computing and data analysis are delivered today and in the future. To store and process the fast-changing big data, various scalable systems (e.g. key-value stores and MapReduce) have recently emerged in industry. However, there is a huge gap between what these open-source software systems can offer and what the real-world applications demand. First, scalable key-value stores are designed for simple data access methods, which limit their use in advanced database applications. Second, existing systems in the cloud need automatic performance optimization for better resource management with minimized operational overhead. Third, the demand continues to grow for privacy-preserving search and information sharing between autonomous data providers, as exemplified by the Healthcare information networks. My Ph.D. research aims at bridging these gaps. First, I proposed HINDEX, for secondary index support on top of write-optimized key-value stores (e.g. HBase and Cassandra). To update the index structure efficiently in the face of an intensive write stream, HINDEX synchronously executes append-only operations and defers the so-called index-repair operations which are expensive. The core contribution of HINDEX is a scheduling framework for deferred and lightweight execution of index repairs. HINDEX has been implemented and is currently being transferred to an IBM big data product. Second, I proposed Auto-pipelining for automatic performance optimization of streaming applications on multi-core machines. The goal is to prevent the bottleneck scenario in which the streaming system is blocked by a single core while all other cores are idling, which wastes resources. To partition the streaming workload evenly to all the cores and to search for the best partitioning among many possibilities, I proposed a heuristic based search strategy that achieves locally optimal partitioning with lightweight search overhead. The key idea is to use a white-box approach to search for the theoretically best partitioning and then use a black-box approach to verify the effectiveness of such partitioning. The proposed technique, called Auto-pipelining, is implemented on IBM Stream S. Third, I proposed ǫ-PPI, a suite of privacy preserving index algorithms that allow data sharing among unknown parties and yet maintaining a desired level of data privacy. To differentiate privacy concerns of different persons, I proposed a personalized privacy definition and substantiated this new privacy requirement by the injection of false positives in the published ǫ-PPI data. To construct the ǫ-PPI securely and efficiently, I proposed to optimize the performance of multi-party computations which are otherwise expensive; the key idea is to use addition-homomorphic secret sharing mechanism which is inexpensive and to do the distributed computation in a scalable P2P overlay.
dc.description.degree Ph.D.
dc.embargo.terms 2015-08-01
dc.format.mimetype application/pdf
dc.identifier.uri http://hdl.handle.net/1853/53995
dc.language.iso en_US
dc.publisher Georgia Institute of Technology
dc.subject Cloud
dc.subject Big-data
dc.subject Security
dc.subject Efficiency
dc.subject Performance
dc.subject Streaming
dc.subject Multi-core
dc.subject Index
dc.subject Key-value stores
dc.subject Privacy preserving
dc.subject Performance optimization
dc.subject Log-structured systems
dc.title Secure and high-performance big-data systems in the cloud
dc.type Text
dc.type.genre Dissertation
dspace.entity.type Publication
local.contributor.advisor Liu, Ling
local.contributor.corporatename College of Computing
relation.isAdvisorOfPublication 96391b98-ac42-4e2c-93ee-79a5e16c2dfb
relation.isOrgUnitOfPublication c8892b3c-8db6-4b7b-a33a-1b67f7db2021
thesis.degree.level Doctoral
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
TANG-DISSERTATION-2014.pdf
Size:
1.87 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
LICENSE_1.txt
Size:
3.86 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
3.86 KB
Format:
Plain Text
Description: