Person:

Liu, Ling

Permanent Link

https://hdl.handle.net/1853/71467

Associated Organization(s)

Organizational Unit

School of Computer Science

Full item page

Publication Search Results

Now showing 1 - 10 of 28

What Where Wi: an Analysis of Millions of Wi-Fi Access Points

(Georgia Institute of Technology, 2006) Jones, R. Kipp ; Liu, Ling

With the growing demand for wireless Internet access and increasing maturity of IEEE 802.11 technologies, wireless networks have sprung up by the millions throughout the world as a popular means for Internet access at homes, in offices and in public areas, such as airports, cafés and coffee shops. An increasingly popular use of IEEE 802.11 networking equipment is to provide wireless "hotspots" as the wireless access points to the Internet. These wireless access points, commonly referred to as WAPs or simply APs, are installed and managed by individuals and businesses in an unregulated manner ^Ö allowing anyone to install and operate one of these radio devices using unlicensed radio spectrum. This has allowed literally millions of these APs to become available and ^Ñvisible^Ò to any interested party who happens to be within range of the radio waves emitted from the device. As the density of these APs increases, these ^Ñbeacons^Ò can be put into multiple uses. From home networking to wireless positioning to mesh networks, there are more alternative ways for connecting wirelessly as newer, longer-range technologies come to market. This paper reports an initial study that examines a database of over 5 million wireless access points collected through wardriving by Skyhook Wireless. By performing the analytical study of this data and the information revealed by this data, including the default naming behavior, movement of access points over time, and density of access points, we found that the AP data, coupled with location information, can provide a fertile ground for understanding the "What, Where and Why" of Wi-Fi access points. More importantly, the analysis and mining of this vast and growing collection of AP data can yield important technological, social and economical results
LIRA: Lightweight, Region-aware Load Shedding in Mobile CQ Systems

(Georgia Institute of Technology, 2006) Gedik, Bugra ; Liu, Ling ; Wu, Kun-Lung ; Yu, Philip S.

Position updates and query re-evaluations are two predominant, costly components of processing location-based, continual queries (CQs) in mobile systems. To obtain high-quality query results, the query processor usually demands receiving frequent position updates from the mobile nodes. However, processing frequent updates oftentimes causes the query processor to become overloaded, under which updates must be dropped randomly, bringing down the quality of query results, negating the benefits of frequent position updates. In this paper, we develop LIRA − a lightweight, region-aware load-shedding technique for preventively reducing the position-update load of a query processor, while maintaining high-quality query results. Instead of having to receive too many updates and then randomly drop some of them, LIRA uses a region-aware partitioning mechanism to identify the most beneficial shedding regions to cut down the position updates sent by the mobile nodes within those regions. Based on the number of mobile nodes and queries in a region, LIRA judiciously applies different amounts of update reduction for different regions, maintaining better overall accuracy of query results. Experimental results show that LIRA is vastly superior to random update dropping and clearly outperforms other alternatives that do not possess full-scale, region-aware load-shedding capabilities. Moreover, due to its lightweight nature, LIRA introduces very little overhead.
Process Mining, Discovery, and Integration Using Distance Measures

(Georgia Institute of Technology, 2006) Bae, Joonsoo ; Caverlee, James ; Liu, Ling ; Rouse, William B.

Business processes continue to play an important role in today's service-oriented enterprise computing systems. Mining, discovering, and integrating process-oriented services has attracted growing attention in the recent year. In this paper we present a quantitative approach to modeling and capturing the similarity and dissimilarity between different workflow designs. Concretely, we introduce a graph-based distance measure and a framework for utilizing this distance measure to mine the process repository and discover workflow designs that are similar to a given design pattern or to produce one integrated workflow design by merging two or more business workflows of similar designs. We derive the similarity measures by analyzing the workflow dependency graphs of the participating workflow processes. Such an analysis is conducted in two phases. We first convert each workflow dependency graph into a normalized process network matrix. Then we calculate the metric space distance between the normalized matrices. This distance measure can be used as a quantitative and qualitative tool in process mining, process merging, and process clustering, and ultimately it can reduce or minimize the costs involved in design, analysis, and evolution of workflow systems.
Scalable Access Control in Content-Based Publish-Subscribe Systems

(Georgia Institute of Technology, 2006) Srivatsa, Mudhakar ; Liu, Ling

Content-based publish-subscribe (pub-sub) systems are an emerging paradigm for building a large number of distributed systems. Access control in a pub-sub system refers to secure distribution of events to clients subscribing to those events without revealing its secret attributes to the unauthorized subscribers. To provide confidentiality guarantees the secret attributes in an event is encrypted so that only authorized subscribers can read them. However, in a content-based pub-sub system, every event can potentially have a different set of authorized subscribers. In the worst case, for NS subscribers, there are 2^NS subgroups, and each event can potentially go to a different subgroup. Hence, efficient key management is a big challenge for implementing access control in pub-sub systems. In this paper, we describe efficient and scalable key management algorithms for securely implementing access control rules in pub-sub systems. We ensure that the key management cost is linear in the number of subscriptions and completely independent of the number of subscribers NS. We present a concrete implementation of our proposal on an operational pub-sub system. An experimental evaluation of our prototype shows that our proposal meets the security requirements while maintaining the scalability and performance of the pub-sub system.
A Random Rotation Perturbation Approach to Privacy Preserving Data Classification

(Georgia Institute of Technology, 2005) Chen, Keke ; Liu, Ling

This paper presents a random rotation perturbation approach for privacy preserving data classification. Concretely, we identify the importance of classification-specific information with respect to the loss of information factor, and present a random rotation perturbation framework for privacy preserving data classification. Our approach has two unique characteristics. First, we identify that many classification models utilize the geometric properties of datasets, which can be preserved by geometric rotation. We prove that the three types of classifiers will deliver the same performance over the rotation perturbed dataset as over the original dataset. Second, we propose a multi-column privacy model to address the problems of evaluating privacy quality for multidimensional perturbation. With this metric, we develop a local optimal algorithm to find the good rotation perturbation in terms of privacy guarantee. We also analyze both naive estimation and ICA-based reconstruction attacks with the privacy model. Our initial experiments show that the random rotation approach can provide high privacy guarantee while maintaining zero-loss of accuracy for the discussed classifiers.
Adaptive Load Shedding for Windowed Stream Joins

(Georgia Institute of Technology, 2005) Gedik, Bugra ; Wu, Kun-Lung ; Yu, Philip S. ; Liu, Ling

We present an adaptive load shedding approach for windowed stream joins. In contrast to the conventional approach of dropping tuples from the input streams, we explore the concept of selective processing for load shedding, focusing on costly stream joins such as those over set-valued or weighted set-valued attributes. The main idea of our adaptive load shedding approach is two-fold. First, we allow stream tuples to be stored in the windows and shed excessive CPU load by performing the stream join operations, not on the entire set of tuples within the windows, but on a dynamically changing subset of tuples that are highly beneficial. Second, we support such dynamic selective processing through three forms of runtime adaptations: By adaptation to input stream rates, we perform partial processing based load shedding and dynamically determine the fraction of the windows to be processed by comparing the tuple consumption rate of join operation to the incoming stream rates. By adaptation to time correlation between the streams, we dynamically determine the number of basic windows to be used and prioritize the tuples for selective processing, encouraging CPU-limited execution of stream joins in high priority basic windows. By adaptation to join directions, we dynamically determine the most beneficial direction to perform stream joins in order to process more useful tuples under heavy load conditions and boost the utility or number of output tuples produced. Our load shedding framework not only enables us to integrate utility-based load shedding with time correlation-based load shedding, but more importantly, it also allows load shedding to be adaptive to various dynamic stream properties. Inverted indexes are used to further speed up the execution of stream joins based on set-valued attributes. Experiments are conducted to evaluate the effectiveness of our adaptive load shedding approach in terms of output rate and utility.
GRUBJOIN: An Adaptive Multi-Way Windowed Stream Join with Time Correlation-Aware CPU Load Shedding

(Georgia Institute of Technology, 2005) Gedik, Bugra ; Wu, Kun-Lung ; Yu, Philip S. ; Liu, Ling

Dropping tuples has been commonly used for load shedding. However, tuple dropping generally is inadequate to shed load for multiway windowed stream joins. The output rate can be unnecessarily and severely degraded because tuple dropping does not recognize time correlations likely to exist among the streams. This paper introduces GrubJoin: an adaptive multi-way windowed stream join that efficiently performs time correlation-aware CPU load shedding. GrubJoin maximizes the output rate by achieving nearoptimal window harvesting within an operator throttling framework, i.e., regulating the fractions of the join windows that are processed by the multi-way join. Window harvesting performs the join using only certain more useful segments of the join windows. Due mainly to the combinatorial explosion of possible multi-way join sequences involving various segments of individual join windows, GrubJoin faces a set of unique challenges, such as determining the optimal window harvesting configuration and learning the time correlations among the streams. To tackle these challenges, we formalize window harvesting as an optimization problem, develop greedy heuristics to determine near-optimal window harvesting configurations and use approximation techniques to capture the time correlations among the streams. Experimental results show that GrubJoin is vastly superior to tuple dropping when time correlations exist among the streams and is equally effective as tuple dropping in the absence of time correlations.
Detecting the Change of Clustering Structure in Categorical Data Streams

(Georgia Institute of Technology, 2005) Chen, Keke ; Liu, Ling

Clustering data streams can provide critical information for making decision in real-time. We argue that detecting the change of clustering structure in the data streams can be beneficial to many realtime monitoring applications. In this paper, we present a framework for detecting changes of clustering structure in categorical data streams. The change of clustering structure is detected by the change of the best number of clusters in the data stream. The framework consists of two main components: the BkPlot method for determining the best number of clusters in a categorical dataset, and the summarization structure, Hierarchical Entropy Tree (HE-Tree), for efficiently capturing the entropy property of the categorical data streams. HE-Tree enables us to quickly and precisely draw the clustering information from the data stream that is needed by BkPlot method to identify the change of best number of clusters. Combining the snapshots of the HE-Tree information and the BkPlot method, we are able to observe the change of clustering structure online. The experiments show that HE-Tree + BkPlot method can efficiently and precisely detect the change of clustering structure in categorical data streams.
Energy-Aware Data Collection in Sensor Networks: A Localized Selective Sampling Approach

(Georgia Institute of Technology, 2005) Gedik, Bugra ; Liu, Ling

One of the most prominent and comprehensive ways of data collection in sensor networks is to periodically extract raw sensor readings. This way of data collection enables complex analysis of data, which may not be possible with in-network aggregation or query processing. However, this flexibility in data analysis comes at the cost of power consumption. In this paper, we introduce selective sampling for energy-efficient periodic data collection in sensor networks. The main idea behind selective sampling is to use a dynamically changing subset of nodes as samplers such that the sensor readings of sampler nodes are directly collected, whereas the values of non-sampler nodes are predicted through the use of probabilistic models that are locally and periodically constructed in an in-network manner. Selective sampling can be effectively used to increase the network lifetime while keeping quality of the collected data high, in scenarios where either the spatial density of the network deployment is superfluous relative to the required spatial resolution for data analysis or certain amount of data quality can be traded off in order to decrease the overall power consumption of the network. Our selective sampling approach consists of three main mechanisms. First, sensing-driven cluster construction is used to create clusters within the network such that nodes with close sensor readings are assigned to the same clusters. Second, correlation-based sampler selection and model derivation is used to determine the sampler nodes and to calculate the parameters of probabilistic models that capture the spatial and temporal correlations among sensor readings. Last, selective data collection and model-based prediction is used to minimize the number of messages used to extract data from the network. A unique feature of our selective sampling mechanisms is the use of localized schemes, as opposed to the protocols requiring global information, to select and dynamically refine the subset of sensor nodes serving as samplers and the modelbased value prediction for non-sampler nodes. Such runtime adaptations create a data collection schedule which is self-optimizing in response to changes in energy levels of nodes and environmental dynamics.
Energy Efficient Exact kNN Search in Wireless Broadcast Environments

(Georgia Institute of Technology, 2004-05-24) Gedik, Bugra ; Singh, Aameek ; Liu, Ling

The advances in wireless communication and decreasing costs of mobile devices have enabled users to access desired information at any time. Coupled with positioning technologies like GPS, this opens up an exciting domain of location based services, allowing a mobile user to query for objects based on its current position. Main bottlenecks in such infrastructures are the draining of power of the mobile devices and the limited network bandwidth available. To alleviate these problems, broadcasting spatial information about relevant objects has been widely accepted as an efficient mechanism. An important class of queries for such an infrastructure is the k-nearest neighbor (kNN) queries, in which users are interested in k closest objects to their position. Most of the research in kNN queries, use unconventional broadcast indexes and provide only approximate kNN search. In this paper, we describe mechanisms to perform exact kNN search on conventional sequential-access R-trees, and optimize established kNN search algorithms. We also propose a novel use of histograms for guiding the search and derive analytical results on maximum queue size and node access count. In addition, we discuss the effects of different broadcast organizations on search performance and challenge the traditional use of Depth-First (dfs) organization. We also extend our mechanisms to support kNN search with non-spatial constraints. While we demonstrate our ideas using a broadcast index, they are equally applicable to any kind of sequential access medium like tertiary tape storage. We validate our mechanims through an extensive experimental analysis and present our findings.