Person:

Liu, Ling

Permanent Link

https://hdl.handle.net/1853/71467

Associated Organization(s)

Organizational Unit

School of Computer Science

Full item page

Publication Search Results

Now showing 1 - 10 of 20

Efficient and Secure Search of Enterprise File Systems

(Georgia Institute of Technology, 2007) Singh, Aameek ; Srivatsa, Mudhakar ; Liu, Ling

With fast paced growth of enterprise data, quickly locating relevant content has become a critical IT capability. Research has shown that nearly 85% of enterprise data lies in flat filesystems [12] that allow multiple users and user groups with different access privileges to underlying data. Any search tool for such large scale systems needs to be efficient and yet cognizant of the access control semantics imposed by the underlying filesystem. Current multiuser enterprise search techniques use two disjoint search and access-control components by creating a single system-wide index and simply filtering search results for access control. This approach is ineffective as the index and query statistics subtly leak private information. The other available approach of using separate indices for each user is undesirable as it not only increases disk consumption due to shared files, but also increases the overheads of updating the indices whenever a file changes. We propose a distributed approach that couples search and access-control into a unified framework and provides secure multiuser search. Our scheme (logically) divides data into independent access-privileges based chunks, called access-control barrels (ACB). ACBs not only manage security but also improve overall efficiency as they can be indexed and searched in parallel by distributing them to multiple enterprise machines. We describe the architecture of ACBs based search framework and propose two optimization technique that ensure the scalability of our approach. We also discuss other useful features of our approach – seamless integration with desktop search and an extenstion to provide secure search in untrusted storage service provider environments. We validate our approach with a detailed evaluation using industry benchmarks and real datasets. Our initial experiments show secure search with 38% improved indexing efficiency and low overheads for ACB processing.
A Recovery Conscious Framework for Fault Resilient Storage Systems

(Georgia Institute of Technology, 2007) Seshadri, Sangeetha ; Liu, Ling ; Chiu, Lawrence ; Constantinescu, Cornel ; Balachandran, Subashini

In this paper we present a recovery-conscious framework for improving the fault resiliency and recovery efficiency of highly concurrent embedded storage software systems. Our framework consists of a three-tier architecture and a suite of recovery conscious techniques. In the top tier, we promote fine-grained recovery at the task level by introducing recovery groups to model recovery dependencies between tasks. At the middle tier we develop highly effective mappings of dependent tasks to processor resources through careful tuning of recovery efficiency sensitive parameters. At the bottom tier, we advocate the use of recovery-conscious scheduling by careful serialization of dependent tasks, which provides high recovery efficiency without sacrificing system performance. We develop a formal model to guide the understanding and the development of techniques for effectively mapping fine-grained tasks to system resources, aiming at reducing the ripple effect of software failures while sustaining high performance even during system recovery. Our techniques have been implemented on a real industry-standard storage system. Experimental results show that our techniques are effective, non-intrusive and can significantly boost system resilience while delivering high performance.
PRIVACYGRID: Supporting Anonymous Location Queries in Mobile Environments

(Georgia Institute of Technology, 2007) Bamba, Bhuvan ; Liu, Ling

We present PRIVACYGRID − a framework for supporting anonymous location-based queries in mobile information delivery systems. The PRIVACYGRID framework offers three unique capabilities. First, we provide a location privacy preference profile model, called location P3P, which allows mobile users to explicitly define their preferred location privacy requirements in terms of both location hiding measures (e.g., location k-anonymity and location l-diversity) and location service quality measures (e.g., maximum spatial resolution and maximum temporal resolution). Second, we develop three fast and effective location cloaking algorithms for providing location k-anonymity and location l-diversity in a mobile environment. The Quad Grid cloaking algorithm is fast but has lower anonymization success rate. The dynamic bottom-up or top-down grid cloaking algorithms provide much higher anonymization success rate and yet are efficient in terms of both time complexity and maintenance cost. Finally, we discuss a hybrid approach that combines the topdown and bottom-up search of location cloaking regions to further lower the average anonymization time. In addition, we argue for incorporating temporal cloaking into the location cloaking process to further increase the success rate of location anonymization. We also discuss the PRIVACYGRID mechanisms for anonymous support of range queries. Our experimental evaluation shows that the PRIVACYGRID approach can provide optimal location anonymity as defined by per user location P3P without introducing significant performance penalties.
A SpatioTemporal Placement Model for Caching Location Dependent Queries

(Georgia Institute of Technology, 2007) Murugappan, Anand ; Liu, Ling

Client side caching of location dependent queries is an important technique for improving performance of location-based services. Most of the existing research in this area has focused on cache replacement and invalidation through incorporating some aspects of the spatial and temporal semantics embedded in the location queries, while assuming an ad hoc cache placement. Very few have studied the impact of spatial and temporal validity semantics and the motion behavior of mobile clients on the effectiveness of cache placement and ultimately the performance of the client cache. This paper proposes an adaptive spatio-temporal placement scheme for caching location dependent queries. The cache placement decision is made according to the potential cache benefit of the query results based on the spatio-temporal properties of query results and the movement patterns of the mobile client, aiming at increasing the cache hit ratio. We introduce the concept of ‘Overlapping Cache Benefit’ as a measure of the hit rate of a cached item, and present three spatio-temporal cache placement schemes, which provide a step-by-step in-depth analysis of various factors that may affect the performance of a client cache in mobile environments. We implemented the spatio-temporal placement model in the first prototype of the MOBICACHE system. Our experimental evaluation shows that the spatial locality and the movement patterns of mobile clients are critical factors that impact the effectiveness of cache placement and the performance of client cache, and the proposed adaptive spatio-temporal cache placement approach yields higher hit ratio and better response time compared to existing mobile cache solutions.
CubeCache: Efficient and Scalable Processing of OLAP Aggregation Queries in a Peer-to-Peer Network

(Georgia Institute of Technology, 2007) Seshadri, Sangeetha ; Cooper, Brian F. ; Liu, Ling

Peer to Peer (P2P) data sharing systems are emerging as a promising infrastructure for collaborative data sharing among multiple geographically distributed data centers within a large enterprise. This paper presents CubeCache, a peer-to-peer system for efficiently serving OLAP queries and data cube aggregations in a distributed data warehouse system. CubeCache combines multiple client caches into a single query processing and caching system. Compared to existing peer-to-peer systems the CubeCache solution has a number of unique features. First, we add a query processing layer to perform innetwork data aggregation over peer caches. Second, we introduce the concept of Query-Trails: a cache listing recent data requestors. Query-Trails make it easier to find caches that are likely to have data needed for a query. Third, we design a benefit measure that incorporates the 'rarity' of a chunk into the notion of benefit, allowing controlled replication of chunks in a system plagued by frequent node departures or failures. We report the results of analysis and an experimental study using simulations and an implemented prototype that shows the CubeCache solution reduces the server load, improves query throughput and reduces query latency for OLAP tasks.
What Where Wi: an Analysis of Millions of Wi-Fi Access Points

(Georgia Institute of Technology, 2006) Jones, R. Kipp ; Liu, Ling

With the growing demand for wireless Internet access and increasing maturity of IEEE 802.11 technologies, wireless networks have sprung up by the millions throughout the world as a popular means for Internet access at homes, in offices and in public areas, such as airports, cafés and coffee shops. An increasingly popular use of IEEE 802.11 networking equipment is to provide wireless "hotspots" as the wireless access points to the Internet. These wireless access points, commonly referred to as WAPs or simply APs, are installed and managed by individuals and businesses in an unregulated manner ^Ö allowing anyone to install and operate one of these radio devices using unlicensed radio spectrum. This has allowed literally millions of these APs to become available and ^Ñvisible^Ò to any interested party who happens to be within range of the radio waves emitted from the device. As the density of these APs increases, these ^Ñbeacons^Ò can be put into multiple uses. From home networking to wireless positioning to mesh networks, there are more alternative ways for connecting wirelessly as newer, longer-range technologies come to market. This paper reports an initial study that examines a database of over 5 million wireless access points collected through wardriving by Skyhook Wireless. By performing the analytical study of this data and the information revealed by this data, including the default naming behavior, movement of access points over time, and density of access points, we found that the AP data, coupled with location information, can provide a fertile ground for understanding the "What, Where and Why" of Wi-Fi access points. More importantly, the analysis and mining of this vast and growing collection of AP data can yield important technological, social and economical results
LIRA: Lightweight, Region-aware Load Shedding in Mobile CQ Systems

(Georgia Institute of Technology, 2006) Gedik, Bugra ; Liu, Ling ; Wu, Kun-Lung ; Yu, Philip S.

Position updates and query re-evaluations are two predominant, costly components of processing location-based, continual queries (CQs) in mobile systems. To obtain high-quality query results, the query processor usually demands receiving frequent position updates from the mobile nodes. However, processing frequent updates oftentimes causes the query processor to become overloaded, under which updates must be dropped randomly, bringing down the quality of query results, negating the benefits of frequent position updates. In this paper, we develop LIRA − a lightweight, region-aware load-shedding technique for preventively reducing the position-update load of a query processor, while maintaining high-quality query results. Instead of having to receive too many updates and then randomly drop some of them, LIRA uses a region-aware partitioning mechanism to identify the most beneficial shedding regions to cut down the position updates sent by the mobile nodes within those regions. Based on the number of mobile nodes and queries in a region, LIRA judiciously applies different amounts of update reduction for different regions, maintaining better overall accuracy of query results. Experimental results show that LIRA is vastly superior to random update dropping and clearly outperforms other alternatives that do not possess full-scale, region-aware load-shedding capabilities. Moreover, due to its lightweight nature, LIRA introduces very little overhead.
Scalable Access Control in Content-Based Publish-Subscribe Systems

(Georgia Institute of Technology, 2006) Srivatsa, Mudhakar ; Liu, Ling

Content-based publish-subscribe (pub-sub) systems are an emerging paradigm for building a large number of distributed systems. Access control in a pub-sub system refers to secure distribution of events to clients subscribing to those events without revealing its secret attributes to the unauthorized subscribers. To provide confidentiality guarantees the secret attributes in an event is encrypted so that only authorized subscribers can read them. However, in a content-based pub-sub system, every event can potentially have a different set of authorized subscribers. In the worst case, for NS subscribers, there are 2^NS subgroups, and each event can potentially go to a different subgroup. Hence, efficient key management is a big challenge for implementing access control in pub-sub systems. In this paper, we describe efficient and scalable key management algorithms for securely implementing access control rules in pub-sub systems. We ensure that the key management cost is linear in the number of subscriptions and completely independent of the number of subscribers NS. We present a concrete implementation of our proposal on an operational pub-sub system. An experimental evaluation of our prototype shows that our proposal meets the security requirements while maintaining the scalability and performance of the pub-sub system.
Adaptive Load Shedding for Windowed Stream Joins

(Georgia Institute of Technology, 2005) Gedik, Bugra ; Wu, Kun-Lung ; Yu, Philip S. ; Liu, Ling

We present an adaptive load shedding approach for windowed stream joins. In contrast to the conventional approach of dropping tuples from the input streams, we explore the concept of selective processing for load shedding, focusing on costly stream joins such as those over set-valued or weighted set-valued attributes. The main idea of our adaptive load shedding approach is two-fold. First, we allow stream tuples to be stored in the windows and shed excessive CPU load by performing the stream join operations, not on the entire set of tuples within the windows, but on a dynamically changing subset of tuples that are highly beneficial. Second, we support such dynamic selective processing through three forms of runtime adaptations: By adaptation to input stream rates, we perform partial processing based load shedding and dynamically determine the fraction of the windows to be processed by comparing the tuple consumption rate of join operation to the incoming stream rates. By adaptation to time correlation between the streams, we dynamically determine the number of basic windows to be used and prioritize the tuples for selective processing, encouraging CPU-limited execution of stream joins in high priority basic windows. By adaptation to join directions, we dynamically determine the most beneficial direction to perform stream joins in order to process more useful tuples under heavy load conditions and boost the utility or number of output tuples produced. Our load shedding framework not only enables us to integrate utility-based load shedding with time correlation-based load shedding, but more importantly, it also allows load shedding to be adaptive to various dynamic stream properties. Inverted indexes are used to further speed up the execution of stream joins based on set-valued attributes. Experiments are conducted to evaluate the effectiveness of our adaptive load shedding approach in terms of output rate and utility.
GRUBJOIN: An Adaptive Multi-Way Windowed Stream Join with Time Correlation-Aware CPU Load Shedding

(Georgia Institute of Technology, 2005) Gedik, Bugra ; Wu, Kun-Lung ; Yu, Philip S. ; Liu, Ling

Dropping tuples has been commonly used for load shedding. However, tuple dropping generally is inadequate to shed load for multiway windowed stream joins. The output rate can be unnecessarily and severely degraded because tuple dropping does not recognize time correlations likely to exist among the streams. This paper introduces GrubJoin: an adaptive multi-way windowed stream join that efficiently performs time correlation-aware CPU load shedding. GrubJoin maximizes the output rate by achieving nearoptimal window harvesting within an operator throttling framework, i.e., regulating the fractions of the join windows that are processed by the multi-way join. Window harvesting performs the join using only certain more useful segments of the join windows. Due mainly to the combinatorial explosion of possible multi-way join sequences involving various segments of individual join windows, GrubJoin faces a set of unique challenges, such as determining the optimal window harvesting configuration and learning the time correlations among the streams. To tackle these challenges, we formalize window harvesting as an optimization problem, develop greedy heuristics to determine near-optimal window harvesting configurations and use approximation techniques to capture the time correlations among the streams. Experimental results show that GrubJoin is vastly superior to tuple dropping when time correlations exist among the streams and is equally effective as tuple dropping in the absence of time correlations.