Person:

Navathe, Shamkant B.

Permanent Link

https://hdl.handle.net/1853/71549

Associated Organization(s)

Organizational Unit

School of Computer Science

Full item page

Publication Search Results

Now showing 1 - 10 of 12

A Clustering Algorithm to Discover Low and High Density Hyper-Rectangles in Subspaces of Multidimensional Data.

(Georgia Institute of Technology, 1999) Omiecinski, Edward ; Navathe, Shamkant B. ; Ezquerra, Norberto F. ; Ordońẽz, Carlos

This paper presents a clustering algorithm to discover low and high density regions in subspaces of multidimensional data for Data Mining applications. High density regions generally refer to typical cases, whereas low density regions indicate infrequent and thus rare cases. For typical applications there is a large number of low density regions and a few of these are interesting. Regions are considered interesting when they have a minimum "volume" and involve some maximum number of dimensions. Our algorithm discovers high density regions (clusters) and low density regions (outliers, negative clusters, holes, empty regions) at the same time. In particular, our algorithm can find empty regions; that is, regions having no data points. The proposed algorithm is fast and simple. There is a large variety of applications in medicine, marketing, astronomy, finance, etc, where interesting and exceptional cases correspond to the low and high density regions discovered by our algorithm.
Toward A Method of Grouping Server Data Fragments for Improving Scalability in Intermittently Synchronized Databases

(Georgia Institute of Technology, 1999) Yee, Wai Gen ; Donahoo, Michael J. ; Navathe, Shamkant B.

We consider the class of mobile computing applications with periodically connected clients. These clients wish to share data; however, due to the expense of mobile communication, they only connect periodically -- and not necessarily synchronously -- to a common network. Traditionally, a continuously-connected server, containing an aggregate of client data, facilitates sharing amongst clients by allowing the clients to upload local updates and download updates submitted by other clients. The server computes and transmits these updates on a client-by-client basis; consequently, the complexity of these operations is on the order of the number of clients, limiting scalability. Recent research proposes exploiting client data overlap by grouping updates according to how the data is shared amongst clients (data-centric) instead of on a client-by-client basis (client-centric). Each client downloads updates for the relevant set of groups. By grouping, update operation distribution is computed only once per group, irrespective of the number of clients downloading a particular group's updates. Additionally, we may gain bandwidth scalability by employing broadcast delivery since, unlike the case in the per-client approach, multiple clients may be interested in a group's updates. Clearly, group composition directly affects the scalability of this approach. Given a relative cost of resources such as server processing, bandwidth, and storage space, we focus on developing a group derivation approach that significantly improves the scalability of the resources. We construct a formal specification of this problem and discuss the intractability of an optimal solution. Based on observations from the specification, we derive a heuristically based approach and evaluate its efficacy with respect to the client-centric approach. We run experiments on an implemented system that demonstrates that as the amount of overlap increases between client subscriptions, the data-centric approach with groups generated by our heuristic-based algorithm yields significant cost reduction when compared to the traditional client-centric approach.
A Mathematical Optimization Approach To Improve Server Scalability In Intermittently Synchronized Databases

(Georgia Institute of Technology, 1999) Yee, Wai Gen ; Navathe, Shamkant B. ; Datta, Anindya ; Mitra, Sabyasachi

This paper addresses a scalability problem in the process of synchronizing the states of multiple client databases which only have deferred access to the server. It turns out that the process of client update file generation is not scalable with the number of clients served. In this paper we concentrate on developing an optimization model to address the scalability problem at the server by aiming for an optimal grouping of data fragments at the server given the "interest sets" of the clients - the set of fragments the client deals with for its"local" processing. The objective is to minimize the total cost of server operation which includes processing updates from all clients and transmission cost of sending the right set of updates to each client based on the client's interest set. An integer programming formulation is developed and solved with an illustrative problem, yielding interesting results.
A Greedy Approach For Improving Update Processing In Intermittently Synchronized Databases

(Georgia Institute of Technology, 1999) Omiecinski, Edward ; Navathe, Shamkant B. ; Ammar, Mostafa H. ; Donahoo, Michael J. ; Malik, Sanjoy ; Yee, Wai Gen

Replication of data on portable computers is a new DBMS technology aimed at catering to a growing population of mobile database users. Clients can download data items such as email, or sales data from a server onto these machines, per use it during commutes, and return any modifications to the server at the end of the day. In this paper, we describe how the servers in these systems generally process update information for clients and reveal a scalability problem--server processing increases quadratically with respect to increasing numbers of clients. We develop a cost model, and propose a solution based on heuristics. By aggregating client interests into datagroups, based on notions such as interest overlap, we can reduce server cost. These techniques are attractive because they are simple and computationally cheap. Simulations show that even simple techniques may yield significant performance improvements.
A knowledge-based approach to integrating and querying distributed heterogeneous information systems

(Georgia Institute of Technology, 1995) Navathe, Shamkant B.
The Impact Of Data Placement Strategies On Reorganization Costs In Parallel Databases

(Georgia Institute of Technology, 1995) Omiecinski, Edward ; Navathe, Shamkant B. ; Achyutuni, Kiran Jyotsna

In this paper, we study the data placement problem from a reorganization point of view. Effective placement of the declustered fragments of a relation is crucial to the performance of parallel database systems having multiple disks. Given the dynamic nature of database systems, the optimal placement of fragments will change over time and this will necessitate a reorganization in order to maintain the performance of the database system at acceptable levels. This study shows that the choice of a data placement strategy can have a significant impact on the reorganization costs. Up until now, data placement heuristics were designed with the principal purpose of balancing the load. However, this paper shows that such a policy can be beneficial only in the short term. Long term database designs should take reorganization costs into consideration while making design choices.
Querying, Navigating and Visualizing an Online Library Catalog

(Georgia Institute of Technology, 1995) Veerasamy, Aravindan ; Hudson, Scott E. ; Navathe, Shamkant B.

We describe the design of an User Interface for a ranked output Information Retrieval system that integrates querying, navigation and visualization in a seamless fashion. Highlights of the system include the following: -- Using a visualization scheme, the interface provides visual feedback to the user about how the query words influence the ranking of retrieved documents. -- By simple drag-and-drop operations of objects on the screen, the interface facilitates a naive end-user in constructing complex structured queries and in providing relevance feedback. -- To suit the evolving information needs of the user, the interface supports navigational features such as browsing documents by specific authors and browsing the Table of Contents of publications. -- The interface integrates an online thesaurus which provides words related to the query that can be used by the user to expand the original query. By providing a rich set of features, the interface coherently supports a wide spectrum of information gathering tactics for different classes of users.
An Efficient Algorithm for Mining Association Rules in Large Databases

(Georgia Institute of Technology, 1995) Omiecinski, Edward ; Navathe, Shamkant B. ; Savasere, Ashok

Mining for association rules between items in a large database of sales transactions has been described as an important database mining problem. In this paper we present an efficient algorithm for mining association rules that is fundamentally different from known algorithms. Compared to the previous algorithms, our algorithm reduces both CPU and I/O overheads. In our experimental study it was found that for large databases, the CPU overhead was reduced by as much as a factor of seven and I/O was reduced by almost an order of magnitude. Hence this algorithm is especially suitable for very large size databases. The algorithm is also ideally suited for parallelization. We have performed extensive experiments and compared the performance of the algorithm with one of the best existing algorithms.
Adaptive and Automated Index Selection in Relational DBMS

(Georgia Institute of Technology, 1994) Omiecinski, Edward ; Navathe, Shamkant B. ; Frank, Martin Robert

We present a novel approach for a tool that assists the database administrator in designing an index configuration for a relational database system. A new methodology for collecting usage statistics at run time is developed which lets the optimizer estimate query execution costs for alternative index configurations. Defining the workload specification required by existing index design tools may be very complex for a large integrated database system. Our tool automatically derives the workload statistics. These statistics are then used to efficiently compute an index configuration. Execution of a prototype of the tool against a sample database demonstrates that the proposed index configuration is reasonably close to the optimum for test query sets.
Specification and Efficient Monitoring of Local Graph-based Constraints in Hypermedia Systems

(Georgia Institute of Technology, 1994) Arnold, Stephen ; Mark, Leo ; Navathe, Shamkant B.

The concept of hypermedia has existed for about fifty years. It became a practical technology in the seventies, and widely available in the eighties. The concept has proven quite useful as a paradigm for information presentation, and been applied to information relevant to many diverse fields. However, the networks of semantic connections that exist in hypermedia systems are often so large and complex that they become overwhelming to people trying to find information in them. This paper presents a number of types of constraints representing application semantics as a way of reducing the complexity of networks of semantic connections in hypermedia. Unlike constraints developed in the past, those presented in this work are graph-based and can be evaluated within local regions of the hypermedia system. In addition, this work presents an algebra on overview graphs of hypermedia systems. To formalize the definitions of the algebra and constraints, this paper presents a data model for hypermedia. Also, this paper presents an algebra on networks of semantic connections in hypermedia. This algebra, in itself, can be used to define overview graphs on networks of semantic connections. In addition, the algebra increases the expressive power of the constraints by allowing the definition of overview graphs to which the constraints can be applied.