Title:
Write-Optimized Indexing for Log-Structured Key-Value Stores

dc.contributor.author Tang, Yuzhe
dc.contributor.author Iyengar, Arun
dc.contributor.author Tan, Wei
dc.contributor.author Fong, Liana
dc.contributor.author Liu, Ling
dc.contributor.corporatename Georgia Institute of Technology. Center for Experimental Research in Computer Systems en_US
dc.contributor.corporatename Georgia Institute of Technology. College of Computing en_US
dc.contributor.corporatename IBM Thomas J. Watson Research Center en_US
dc.date.accessioned 2015-06-09T16:58:46Z
dc.date.available 2015-06-09T16:58:46Z
dc.date.issued 2014
dc.description.abstract The recent shift towards write-intensive workload on big data (e.g., financial trading, social user-generated data streams) has pushed the proliferation of the log-structured key-value stores, represented by Google’s BigTable, HBase and Cassandra; these systems optimize write performance by adopting a log-structured merge design. While providing key-based access methods based on a Put/Get interface, these key-value stores do not support value-based access methods, which significantly limits their applicability in many web and Internet applications, such as real-time search for all tweets or blogs containing “government shutdown”. In this paper, we present HINDEX, a write-optimized indexing scheme on the log-structured key-value stores. To index intensively updated big data in real time, the index maintenance is made lightweight by a design tailored to the unique characteristic of the underlying log-structured key-value stores. Concretely, HINDEX performs append-only index updates, which avoids the reading of historic data versions, an expensive operation in the log-structure store. To fix the potentially obsolete index entries, HINDEX proposes an offline index repair process through tight coupling with the routine compactions. HINDEX’s system design is generic to the Put/Get interface; we implemented a prototype of HINDEX based on HBase without internal code modification. Our experiments show that the HINDEX offers significant performance advantage for the write-intensive index maintenance. en_US
dc.embargo.terms null en_US
dc.identifier.uri http://hdl.handle.net/1853/53629
dc.language.iso en_US en_US
dc.publisher Georgia Institute of Technology en_US
dc.relation.ispartofseries CERCS ; GIT-CERCS-14-01 en_US
dc.subject Index maintenance en_US
dc.subject Key-based access methods en_US
dc.subject Key-value stores en_US
dc.subject Value-based access methods en_US
dc.subject Write-intensive index maintenance en_US
dc.title Write-Optimized Indexing for Log-Structured Key-Value Stores en_US
dc.type Text
dc.type.genre Technical Report
dspace.entity.type Publication
local.contributor.author Liu, Ling
local.contributor.corporatename Center for Experimental Research in Computer Systems
local.relation.ispartofseries CERCS Technical Report Series
relation.isAuthorOfPublication 96391b98-ac42-4e2c-93ee-79a5e16c2dfb
relation.isOrgUnitOfPublication 1dd858c0-be27-47fd-873d-208407cf0794
relation.isSeriesOfPublication bc21f6b3-4b86-4b92-8b66-d65d59e12c54
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
git-cercs-14-01.pdf
Size:
415.2 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
3.13 KB
Format:
Item-specific license agreed upon to submission
Description: