Liu, Ling

Associated Organizations
Organizational Unit
Finding Aid

Publication Search Results

Now showing 1 - 10 of 46
  • Item
    Agyaat: Providing Mutually Anonymous Services over Structured P2P Networks
    (Georgia Institute of Technology, 2004-03-23) Singh, Aameek ; Liu, Ling ; Center for Experimental Research in Computer Systems
    In the modern era of ubiquitous computing, privacy is one of the most critical user concerns. To prevent their privacy, users typically, try to remain anonymous to the service provider. This is especially true for decentralized Peer-to-Peer (P2P) systems, where common users act both as clients and as service providers. Preserving privacy in such cases requires mutual anonymity, which shields the users at both ends. Most unstructured P2P systems like Gnutella, Kazaa provide a certain level of anonymity through the use of a random overlay topology and a flooding based routing protocol, but suffer from the lack of guaranteed lookup of data. In contrast, most structured P2P systems like Chord, are Distributed Hash Table (DHT) based systems and provide guarantees that any stored data item can be found within a bounded number of hops. However, none of the existing DHT systems provide any mutual anonymity. In this paper, we present Agyaat - a decentralized P2P system that has the desired properties of privacy-preserving mutual anonymity and still accomplishes the performance benefits of scalable and guaranteed lookups. A unique characteristic of its design is its low-cost, yet highly effective approach to support mutual anonymity. Instead of adding explicit anonymity services to the network, Agyaat advocates the utilization of unstructured topologies, referred as clouds, over structured DHT overlays. Cloud topologies have an important feature of local query termination, which is critical to facilitate mutual anonymity. To overcome the drawbacks of typical Gnutella like systems, Agyaat introduces a number of novel mechanisms that enhance the scalability and efficiency of routing. Compared with existing pure DHT based systems, Agyaat provides mutual anonymity while ensuring similar routing performance (differing only by constants) in terms of both number of hops and aggregate messaging costs. We validate the Agyaat solution in two steps. First, we conduct a set of experiments to analyze the system performance and compare it with other popular pure DHT based systems. Second, we perform a thorough security (anonymity) analysis under the passive logging model. We discuss possible privacy compromising attacks and their impact, and propose various defenses to thwart such attacks.
  • Item
    Discovering and Ranking Data Intensive Web Services: A Source-Biased Approach
    (Georgia Institute of Technology, 2003) Caverlee, James ; Liu, Ling ; Rocco, Daniel J. (Daniel John) ; Center for Experimental Research in Computer Systems
    This paper presents a novel source-biased approach to automatically discover and rank relevant data intensive web services. It supports a service-centric view of the Web through source-biased probing and source-biased relevance detection and ranking metrics. Concretely, our approach is capable of answering source-centric queries by focusing on the nature and degree of the topical relevance of one service to others. This source-biased probing allows us to determine in very few interactions whether a target service is relevant to the source by probing the target with very precise probes and then ranking the relevant services discovered based on a set of metrics we define. Our metrics allow us to determine the nature and degree of the relevance of one service to another. We also introduce a performance enhancement to our basic approach called source-biased probing with focal terms. We also extend the basic probing framework to a more generalized service neighborhood graph model. We discuss the semantics of the neighborhood graph, how we may reason about the relationships among multiple services, and how we rank services based on the service neighborhood graph model. We also report initial experiments to show the effectiveness of our approach.
  • Item
    What Where Wi: an Analysis of Millions of Wi-Fi Access Points
    (Georgia Institute of Technology, 2006) Jones, R. Kipp ; Liu, Ling ; Center for Experimental Research in Computer Systems
    With the growing demand for wireless Internet access and increasing maturity of IEEE 802.11 technologies, wireless networks have sprung up by the millions throughout the world as a popular means for Internet access at homes, in offices and in public areas, such as airports, cafés and coffee shops. An increasingly popular use of IEEE 802.11 networking equipment is to provide wireless "hotspots" as the wireless access points to the Internet. These wireless access points, commonly referred to as WAPs or simply APs, are installed and managed by individuals and businesses in an unregulated manner ^Ö allowing anyone to install and operate one of these radio devices using unlicensed radio spectrum. This has allowed literally millions of these APs to become available and ^Ñvisible^Ò to any interested party who happens to be within range of the radio waves emitted from the device. As the density of these APs increases, these ^Ñbeacons^Ò can be put into multiple uses. From home networking to wireless positioning to mesh networks, there are more alternative ways for connecting wirelessly as newer, longer-range technologies come to market. This paper reports an initial study that examines a database of over 5 million wireless access points collected through wardriving by Skyhook Wireless. By performing the analytical study of this data and the information revealed by this data, including the default naming behavior, movement of access points over time, and density of access points, we found that the AP data, coupled with location information, can provide a fertile ground for understanding the "What, Where and Why" of Wi-Fi access points. More importantly, the analysis and mining of this vast and growing collection of AP data can yield important technological, social and economical results
  • Item
    Energy Efficient Exact kNN Search in Wireless Broadcast Environments
    (Georgia Institute of Technology, 2004-05-24) Gedik, Bugra ; Singh, Aameek ; Liu, Ling ; Center for Experimental Research in Computer Systems
    The advances in wireless communication and decreasing costs of mobile devices have enabled users to access desired information at any time. Coupled with positioning technologies like GPS, this opens up an exciting domain of location based services, allowing a mobile user to query for objects based on its current position. Main bottlenecks in such infrastructures are the draining of power of the mobile devices and the limited network bandwidth available. To alleviate these problems, broadcasting spatial information about relevant objects has been widely accepted as an efficient mechanism. An important class of queries for such an infrastructure is the k-nearest neighbor (kNN) queries, in which users are interested in k closest objects to their position. Most of the research in kNN queries, use unconventional broadcast indexes and provide only approximate kNN search. In this paper, we describe mechanisms to perform exact kNN search on conventional sequential-access R-trees, and optimize established kNN search algorithms. We also propose a novel use of histograms for guiding the search and derive analytical results on maximum queue size and node access count. In addition, we discuss the effects of different broadcast organizations on search performance and challenge the traditional use of Depth-First (dfs) organization. We also extend our mechanisms to support kNN search with non-spatial constraints. While we demonstrate our ideas using a broadcast index, they are equally applicable to any kind of sequential access medium like tertiary tape storage. We validate our mechanims through an extensive experimental analysis and present our findings.
  • Item
    GRUBJOIN: An Adaptive Multi-Way Windowed Stream Join with Time Correlation-Aware CPU Load Shedding
    (Georgia Institute of Technology, 2005) Gedik, Bugra ; Wu, Kun-Lung ; Yu, Philip S. ; Liu, Ling ; Center for Experimental Research in Computer Systems
    Dropping tuples has been commonly used for load shedding. However, tuple dropping generally is inadequate to shed load for multiway windowed stream joins. The output rate can be unnecessarily and severely degraded because tuple dropping does not recognize time correlations likely to exist among the streams. This paper introduces GrubJoin: an adaptive multi-way windowed stream join that efficiently performs time correlation-aware CPU load shedding. GrubJoin maximizes the output rate by achieving nearoptimal window harvesting within an operator throttling framework, i.e., regulating the fractions of the join windows that are processed by the multi-way join. Window harvesting performs the join using only certain more useful segments of the join windows. Due mainly to the combinatorial explosion of possible multi-way join sequences involving various segments of individual join windows, GrubJoin faces a set of unique challenges, such as determining the optimal window harvesting configuration and learning the time correlations among the streams. To tackle these challenges, we formalize window harvesting as an optimization problem, develop greedy heuristics to determine near-optimal window harvesting configurations and use approximation techniques to capture the time correlations among the streams. Experimental results show that GrubJoin is vastly superior to tuple dropping when time correlations exist among the streams and is equally effective as tuple dropping in the absence of time correlations.
  • Item
    Improving Peer to Peer Search With Multi-Tier Capability-Aware Overlay Topologies
    (Georgia Institute of Technology, 2003) Srivatsa, Mudhakar ; Gedik, Bugra ; Liu, Ling ; Center for Experimental Research in Computer Systems
    The P2P model has many potential advantages (e.g., large scale, fault-tolerance, low cost of administration and maintenance) due to the design flexibility of overlay networks and the decentralized management of cooperative sharing of information and resources. However, the mismatch between the randomly constructed overlay network topology (combined with its broadcast-style message forwarding infrastructure) and the underlying packet routing introduces difficult performance problems, exemplified by the Short-Cut Effect. This paper presents two peer-to-peer (P2P) system-level facilities to address the problems. First, we propose a capability-aware mechanism to structure the overlay topology in the form of layers that takes peer heterogeneity into account. Second, we develop a Probabilistic Broadening search technique, empowered with capability-sensitive query forwarding scheme which integrates gracefully with result caching techniques to improve the search performance of a P2P! system. We believe that efforts on bridging the gap (mismatch) between overlay networks and underlying Internet will bring P2P services beyond pure ``best effort'' and closer to serious applications with quality of service requirements.
  • Item
    Efficient and Secure Search of Enterprise File Systems
    (Georgia Institute of Technology, 2007) Singh, Aameek ; Srivatsa, Mudhakar ; Liu, Ling ; Center for Experimental Research in Computer Systems
    With fast paced growth of enterprise data, quickly locating relevant content has become a critical IT capability. Research has shown that nearly 85% of enterprise data lies in flat filesystems [12] that allow multiple users and user groups with different access privileges to underlying data. Any search tool for such large scale systems needs to be efficient and yet cognizant of the access control semantics imposed by the underlying filesystem. Current multiuser enterprise search techniques use two disjoint search and access-control components by creating a single system-wide index and simply filtering search results for access control. This approach is ineffective as the index and query statistics subtly leak private information. The other available approach of using separate indices for each user is undesirable as it not only increases disk consumption due to shared files, but also increases the overheads of updating the indices whenever a file changes. We propose a distributed approach that couples search and access-control into a unified framework and provides secure multiuser search. Our scheme (logically) divides data into independent access-privileges based chunks, called access-control barrels (ACB). ACBs not only manage security but also improve overall efficiency as they can be indexed and searched in parallel by distributing them to multiple enterprise machines. We describe the architecture of ACBs based search framework and propose two optimization technique that ensure the scalability of our approach. We also discuss other useful features of our approach – seamless integration with desktop search and an extenstion to provide secure search in untrusted storage service provider environments. We validate our approach with a detailed evaluation using industry benchmarks and real datasets. Our initial experiments show secure search with 38% improved indexing efficiency and low overheads for ACB processing.
  • Item
    Towards Finding Optimal Partitions of Categorical Datasets
    (Georgia Institute of Technology, 2003) Chen, Keke ; Liu, Ling ; College of Computing
    A considerable amount of work has been dedicated to clustering numerical data sets, but only a handful of categorical clustering algorithms are reported to date. Furthermore, almost none has addressed the following two important cluster validity problems: (1) Given a data set and a clustering algorithm that partitions the data set into k clusters, how can we determine the best k with respect to the given dataset? (2) Given a dataset and a set of clustering algorithms with a fixed k, how to determine which one will produce k clusters of the best quality? In this paper, we investigate the entropy and expected-entropy concepts for clustering categorical data, and propose a cluster validity method based on the characteristics of expected-entropy. In addition, we develop an agglomerative hierarchical algorithm (HierEntro) to incorporate the proposed cluster validity method into the clustering process. We report our initial experimental results showing the effectiveness of the proposed clustering validity method and the benefits of the HierEntro clustering algorithm.
  • Item
    Mondrian Tree: Efficient Indexing Structure for Scalable Spatial Triggers Processing over Mobile Environment
    (Georgia Institute of Technology, 2010) Doo, Myungcheol ; Liu, Ling ; Narasimhan, Nitya ; Vasudevan, Venu ; Center for Experimental Research in Computer Systems ; Georgia Institute of Technology. Center for Experimental Research in Computer Systems ; Georgia Institute of Technology. School of Computer Science ; Motorola, inc. Applied Research Center
    Spatial Alarms are reminders for mobile users upon their arrival of certain spatial location of interest. Spatial alarm processing requires meeting two demanding objectives: high accuracy, which ensures zero or very low alarm misses, and high scalability, which requires highly efficient and optimal processing of spatial alarms. Existing techniques for processing spatial alarms cannot solve these two problems at the same time. In this paper we present the design and implementation of a new indexing technique, Mondrian tree. The Mondrian tree indexing method partitions the entire universe of discourse into spatial alarm monitoring regions and alarm-free regions. This enables us to reduce the number of on-demand alarm-free region computations, significant saving of both server load and client to server communication cost. We evaluate the efficiency of the Mondrian tree indexing approach using a road network simulator and show that the Mondrian tree offers significant performance enhancements on spatial alarm processing at both the server side and the client side.
  • Item
    Detecting the Change of Clustering Structure in Categorical Data Streams
    (Georgia Institute of Technology, 2005) Chen, Keke ; Liu, Ling ; College of Computing
    Clustering data streams can provide critical information for making decision in real-time. We argue that detecting the change of clustering structure in the data streams can be beneficial to many realtime monitoring applications. In this paper, we present a framework for detecting changes of clustering structure in categorical data streams. The change of clustering structure is detected by the change of the best number of clusters in the data stream. The framework consists of two main components: the BkPlot method for determining the best number of clusters in a categorical dataset, and the summarization structure, Hierarchical Entropy Tree (HE-Tree), for efficiently capturing the entropy property of the categorical data streams. HE-Tree enables us to quickly and precisely draw the clustering information from the data stream that is needed by BkPlot method to identify the change of best number of clusters. Combining the snapshots of the HE-Tree information and the BkPlot method, we are able to observe the change of clustering structure online. The experiments show that HE-Tree + BkPlot method can efficiently and precisely detect the change of clustering structure in categorical data streams.