Person:
Liu, Ling

Associated Organization(s)
Organizational Unit
ORCID
ArchiveSpace Name Record

Publication Search Results

Now showing 1 - 10 of 23
Thumbnail Image
Item

Efficient and Secure Search of Enterprise File Systems

2007 , Singh, Aameek , Srivatsa, Mudhakar , Liu, Ling

With fast paced growth of enterprise data, quickly locating relevant content has become a critical IT capability. Research has shown that nearly 85% of enterprise data lies in flat filesystems [12] that allow multiple users and user groups with different access privileges to underlying data. Any search tool for such large scale systems needs to be efficient and yet cognizant of the access control semantics imposed by the underlying filesystem. Current multiuser enterprise search techniques use two disjoint search and access-control components by creating a single system-wide index and simply filtering search results for access control. This approach is ineffective as the index and query statistics subtly leak private information. The other available approach of using separate indices for each user is undesirable as it not only increases disk consumption due to shared files, but also increases the overheads of updating the indices whenever a file changes. We propose a distributed approach that couples search and access-control into a unified framework and provides secure multiuser search. Our scheme (logically) divides data into independent access-privileges based chunks, called access-control barrels (ACB). ACBs not only manage security but also improve overall efficiency as they can be indexed and searched in parallel by distributing them to multiple enterprise machines. We describe the architecture of ACBs based search framework and propose two optimization technique that ensure the scalability of our approach. We also discuss other useful features of our approach – seamless integration with desktop search and an extenstion to provide secure search in untrusted storage service provider environments. We validate our approach with a detailed evaluation using industry benchmarks and real datasets. Our initial experiments show secure search with 38% improved indexing efficiency and low overheads for ACB processing.

Thumbnail Image
Item

A SpatioTemporal Placement Model for Caching Location Dependent Queries

2007 , Murugappan, Anand , Liu, Ling

Client side caching of location dependent queries is an important technique for improving performance of location-based services. Most of the existing research in this area has focused on cache replacement and invalidation through incorporating some aspects of the spatial and temporal semantics embedded in the location queries, while assuming an ad hoc cache placement. Very few have studied the impact of spatial and temporal validity semantics and the motion behavior of mobile clients on the effectiveness of cache placement and ultimately the performance of the client cache. This paper proposes an adaptive spatio-temporal placement scheme for caching location dependent queries. The cache placement decision is made according to the potential cache benefit of the query results based on the spatio-temporal properties of query results and the movement patterns of the mobile client, aiming at increasing the cache hit ratio. We introduce the concept of ‘Overlapping Cache Benefit’ as a measure of the hit rate of a cached item, and present three spatio-temporal cache placement schemes, which provide a step-by-step in-depth analysis of various factors that may affect the performance of a client cache in mobile environments. We implemented the spatio-temporal placement model in the first prototype of the MOBICACHE system. Our experimental evaluation shows that the spatial locality and the movement patterns of mobile clients are critical factors that impact the effectiveness of cache placement and the performance of client cache, and the proposed adaptive spatio-temporal cache placement approach yields higher hit ratio and better response time compared to existing mobile cache solutions.

Thumbnail Image
Item

LIRA: Lightweight, Region-aware Load Shedding in Mobile CQ Systems

2006 , Gedik, Bugra , Liu, Ling , Wu, Kun-Lung , Yu, Philip S.

Position updates and query re-evaluations are two predominant, costly components of processing location-based, continual queries (CQs) in mobile systems. To obtain high-quality query results, the query processor usually demands receiving frequent position updates from the mobile nodes. However, processing frequent updates oftentimes causes the query processor to become overloaded, under which updates must be dropped randomly, bringing down the quality of query results, negating the benefits of frequent position updates. In this paper, we develop LIRA − a lightweight, region-aware load-shedding technique for preventively reducing the position-update load of a query processor, while maintaining high-quality query results. Instead of having to receive too many updates and then randomly drop some of them, LIRA uses a region-aware partitioning mechanism to identify the most beneficial shedding regions to cut down the position updates sent by the mobile nodes within those regions. Based on the number of mobile nodes and queries in a region, LIRA judiciously applies different amounts of update reduction for different regions, maintaining better overall accuracy of query results. Experimental results show that LIRA is vastly superior to random update dropping and clearly outperforms other alternatives that do not possess full-scale, region-aware load-shedding capabilities. Moreover, due to its lightweight nature, LIRA introduces very little overhead.

Thumbnail Image
Item

A Random Rotation Perturbation Approach to Privacy Preserving Data Classification

2005 , Chen, Keke , Liu, Ling

This paper presents a random rotation perturbation approach for privacy preserving data classification. Concretely, we identify the importance of classification-specific information with respect to the loss of information factor, and present a random rotation perturbation framework for privacy preserving data classification. Our approach has two unique characteristics. First, we identify that many classification models utilize the geometric properties of datasets, which can be preserved by geometric rotation. We prove that the three types of classifiers will deliver the same performance over the rotation perturbed dataset as over the original dataset. Second, we propose a multi-column privacy model to address the problems of evaluating privacy quality for multidimensional perturbation. With this metric, we develop a local optimal algorithm to find the good rotation perturbation in terms of privacy guarantee. We also analyze both naive estimation and ICA-based reconstruction attacks with the privacy model. Our initial experiments show that the random rotation approach can provide high privacy guarantee while maintaining zero-loss of accuracy for the discussed classifiers.

Thumbnail Image
Item

A Recovery Conscious Framework for Fault Resilient Storage Systems

2007 , Seshadri, Sangeetha , Liu, Ling , Chiu, Lawrence , Constantinescu, Cornel , Balachandran, Subashini

In this paper we present a recovery-conscious framework for improving the fault resiliency and recovery efficiency of highly concurrent embedded storage software systems. Our framework consists of a three-tier architecture and a suite of recovery conscious techniques. In the top tier, we promote fine-grained recovery at the task level by introducing recovery groups to model recovery dependencies between tasks. At the middle tier we develop highly effective mappings of dependent tasks to processor resources through careful tuning of recovery efficiency sensitive parameters. At the bottom tier, we advocate the use of recovery-conscious scheduling by careful serialization of dependent tasks, which provides high recovery efficiency without sacrificing system performance. We develop a formal model to guide the understanding and the development of techniques for effectively mapping fine-grained tasks to system resources, aiming at reducing the ripple effect of software failures while sustaining high performance even during system recovery. Our techniques have been implemented on a real industry-standard storage system. Experimental results show that our techniques are effective, non-intrusive and can significantly boost system resilience while delivering high performance.

Thumbnail Image
Item

CubeCache: Efficient and Scalable Processing of OLAP Aggregation Queries in a Peer-to-Peer Network

2007 , Seshadri, Sangeetha , Cooper, Brian F. , Liu, Ling

Peer to Peer (P2P) data sharing systems are emerging as a promising infrastructure for collaborative data sharing among multiple geographically distributed data centers within a large enterprise. This paper presents CubeCache, a peer-to-peer system for efficiently serving OLAP queries and data cube aggregations in a distributed data warehouse system. CubeCache combines multiple client caches into a single query processing and caching system. Compared to existing peer-to-peer systems the CubeCache solution has a number of unique features. First, we add a query processing layer to perform innetwork data aggregation over peer caches. Second, we introduce the concept of Query-Trails: a cache listing recent data requestors. Query-Trails make it easier to find caches that are likely to have data needed for a query. Third, we design a benefit measure that incorporates the 'rarity' of a chunk into the notion of benefit, allowing controlled replication of chunks in a system plagued by frequent node departures or failures. We report the results of analysis and an experimental study using simulations and an implemented prototype that shows the CubeCache solution reduces the server load, improves query throughput and reduces query latency for OLAP tasks.

Thumbnail Image
Item

Process Mining, Discovery, and Integration Using Distance Measures

2006 , Bae, Joonsoo , Caverlee, James , Liu, Ling , Rouse, William B.

Business processes continue to play an important role in today's service-oriented enterprise computing systems. Mining, discovering, and integrating process-oriented services has attracted growing attention in the recent year. In this paper we present a quantitative approach to modeling and capturing the similarity and dissimilarity between different workflow designs. Concretely, we introduce a graph-based distance measure and a framework for utilizing this distance measure to mine the process repository and discover workflow designs that are similar to a given design pattern or to produce one integrated workflow design by merging two or more business workflows of similar designs. We derive the similarity measures by analyzing the workflow dependency graphs of the participating workflow processes. Such an analysis is conducted in two phases. We first convert each workflow dependency graph into a normalized process network matrix. Then we calculate the metric space distance between the normalized matrices. This distance measure can be used as a quantitative and qualitative tool in process mining, process merging, and process clustering, and ultimately it can reduce or minimize the costs involved in design, analysis, and evolution of workflow systems.

Thumbnail Image
Item

PRIVACYGRID: Supporting Anonymous Location Queries in Mobile Environments

2007 , Bamba, Bhuvan , Liu, Ling

We present PRIVACYGRID − a framework for supporting anonymous location-based queries in mobile information delivery systems. The PRIVACYGRID framework offers three unique capabilities. First, we provide a location privacy preference profile model, called location P3P, which allows mobile users to explicitly define their preferred location privacy requirements in terms of both location hiding measures (e.g., location k-anonymity and location l-diversity) and location service quality measures (e.g., maximum spatial resolution and maximum temporal resolution). Second, we develop three fast and effective location cloaking algorithms for providing location k-anonymity and location l-diversity in a mobile environment. The Quad Grid cloaking algorithm is fast but has lower anonymization success rate. The dynamic bottom-up or top-down grid cloaking algorithms provide much higher anonymization success rate and yet are efficient in terms of both time complexity and maintenance cost. Finally, we discuss a hybrid approach that combines the topdown and bottom-up search of location cloaking regions to further lower the average anonymization time. In addition, we argue for incorporating temporal cloaking into the location cloaking process to further increase the success rate of location anonymization. We also discuss the PRIVACYGRID mechanisms for anonymous support of range queries. Our experimental evaluation shows that the PRIVACYGRID approach can provide optimal location anonymity as defined by per user location P3P without introducing significant performance penalties.

Thumbnail Image
Item

What Where Wi: an Analysis of Millions of Wi-Fi Access Points

2006 , Jones, R. Kipp , Liu, Ling

With the growing demand for wireless Internet access and increasing maturity of IEEE 802.11 technologies, wireless networks have sprung up by the millions throughout the world as a popular means for Internet access at homes, in offices and in public areas, such as airports, cafés and coffee shops. An increasingly popular use of IEEE 802.11 networking equipment is to provide wireless "hotspots" as the wireless access points to the Internet. These wireless access points, commonly referred to as WAPs or simply APs, are installed and managed by individuals and businesses in an unregulated manner ^Ö allowing anyone to install and operate one of these radio devices using unlicensed radio spectrum. This has allowed literally millions of these APs to become available and ^Ñvisible^Ò to any interested party who happens to be within range of the radio waves emitted from the device. As the density of these APs increases, these ^Ñbeacons^Ò can be put into multiple uses. From home networking to wireless positioning to mesh networks, there are more alternative ways for connecting wirelessly as newer, longer-range technologies come to market. This paper reports an initial study that examines a database of over 5 million wireless access points collected through wardriving by Skyhook Wireless. By performing the analytical study of this data and the information revealed by this data, including the default naming behavior, movement of access points over time, and density of access points, we found that the AP data, coupled with location information, can provide a fertile ground for understanding the "What, Where and Why" of Wi-Fi access points. More importantly, the analysis and mining of this vast and growing collection of AP data can yield important technological, social and economical results

Thumbnail Image
Item

Scalable Access Control in Content-Based Publish-Subscribe Systems

2006 , Srivatsa, Mudhakar , Liu, Ling

Content-based publish-subscribe (pub-sub) systems are an emerging paradigm for building a large number of distributed systems. Access control in a pub-sub system refers to secure distribution of events to clients subscribing to those events without revealing its secret attributes to the unauthorized subscribers. To provide confidentiality guarantees the secret attributes in an event is encrypted so that only authorized subscribers can read them. However, in a content-based pub-sub system, every event can potentially have a different set of authorized subscribers. In the worst case, for NS subscribers, there are 2^NS subgroups, and each event can potentially go to a different subgroup. Hence, efficient key management is a big challenge for implementing access control in pub-sub systems. In this paper, we describe efficient and scalable key management algorithms for securely implementing access control rules in pub-sub systems. We ensure that the key management cost is linear in the number of subscriptions and completely independent of the number of subscribers NS. We present a concrete implementation of our proposal on an operational pub-sub system. An experimental evaluation of our prototype shows that our proposal meets the security requirements while maintaining the scalability and performance of the pub-sub system.