Navathe, Shamkant B.
Publication Search Results
Now showing 1 - 10 of 14
ItemAn Efficient Algorithm for Mining Association Rules in Large Databases(Georgia Institute of Technology, 1995) Omiecinski, Edward ; Navathe, Shamkant B. ; Savasere, Ashok ; College of ComputingMining for association rules between items in a large database of sales transactions has been described as an important database mining problem. In this paper we present an efficient algorithm for mining association rules that is fundamentally different from known algorithms. Compared to the previous algorithms, our algorithm reduces both CPU and I/O overheads. In our experimental study it was found that for large databases, the CPU overhead was reduced by as much as a factor of seven and I/O was reduced by almost an order of magnitude. Hence this algorithm is especially suitable for very large size databases. The algorithm is also ideally suited for parallelization. We have performed extensive experiments and compared the performance of the algorithm with one of the best existing algorithms.
ItemA knowledge-based approach to integrating and querying distributed heterogeneous information systems(Georgia Institute of Technology, 1995) Navathe, Shamkant B. ; Georgia Institute of Technology. Office of Sponsored Programs ; Georgia Institute of Technology. College of Computing ; Georgia Institute of Technology. Office of Sponsored Programs
ItemModeling of database constraints in active databases(Georgia Institute of Technology, 1993) Navathe, Shamkant B. ; Georgia Institute of Technology. Office of Sponsored Programs ; Georgia Institute of Technology. College of Computing ; Georgia Institute of Technology. Office of Sponsored Programs
ItemToward A Method of Grouping Server Data Fragments for Improving Scalability in Intermittently Synchronized Databases(Georgia Institute of Technology, 1999) Yee, Wai Gen ; Donahoo, Michael J. ; Navathe, Shamkant B. ; College of ComputingWe consider the class of mobile computing applications with periodically connected clients. These clients wish to share data; however, due to the expense of mobile communication, they only connect periodically -- and not necessarily synchronously -- to a common network. Traditionally, a continuously-connected server, containing an aggregate of client data, facilitates sharing amongst clients by allowing the clients to upload local updates and download updates submitted by other clients. The server computes and transmits these updates on a client-by-client basis; consequently, the complexity of these operations is on the order of the number of clients, limiting scalability. Recent research proposes exploiting client data overlap by grouping updates according to how the data is shared amongst clients (data-centric) instead of on a client-by-client basis (client-centric). Each client downloads updates for the relevant set of groups. By grouping, update operation distribution is computed only once per group, irrespective of the number of clients downloading a particular group's updates. Additionally, we may gain bandwidth scalability by employing broadcast delivery since, unlike the case in the per-client approach, multiple clients may be interested in a group's updates. Clearly, group composition directly affects the scalability of this approach. Given a relative cost of resources such as server processing, bandwidth, and storage space, we focus on developing a group derivation approach that significantly improves the scalability of the resources. We construct a formal specification of this problem and discuss the intractability of an optimal solution. Based on observations from the specification, we derive a heuristically based approach and evaluate its efficacy with respect to the client-centric approach. We run experiments on an implemented system that demonstrates that as the amount of overlap increases between client subscriptions, the data-centric approach with groups generated by our heuristic-based algorithm yields significant cost reduction when compared to the traditional client-centric approach.
ItemA Clustering Algorithm to Discover Low and High Density Hyper-Rectangles in Subspaces of Multidimensional Data.(Georgia Institute of Technology, 1999) Omiecinski, Edward ; Navathe, Shamkant B. ; Ezquerra, Norberto F. ; Ordońẽz, Carlos ; College of ComputingThis paper presents a clustering algorithm to discover low and high density regions in subspaces of multidimensional data for Data Mining applications. High density regions generally refer to typical cases, whereas low density regions indicate infrequent and thus rare cases. For typical applications there is a large number of low density regions and a few of these are interesting. Regions are considered interesting when they have a minimum "volume" and involve some maximum number of dimensions. Our algorithm discovers high density regions (clusters) and low density regions (outliers, negative clusters, holes, empty regions) at the same time. In particular, our algorithm can find empty regions; that is, regions having no data points. The proposed algorithm is fast and simple. There is a large variety of applications in medicine, marketing, astronomy, finance, etc, where interesting and exceptional cases correspond to the low and high density regions discovered by our algorithm.
ItemA methodology for application design using active database technology(Georgia Institute of Technology, 1993) Navathe, Shamkant B. ; Georgia Institute of Technology. Office of Sponsored Programs ; Georgia Institute of Technology. College of Computing ; Georgia Institute of Technology. Office of Sponsored Programs
ItemA Mathematical Optimization Approach To Improve Server Scalability In Intermittently Synchronized Databases(Georgia Institute of Technology, 1999) Yee, Wai Gen ; Navathe, Shamkant B. ; Datta, Anindya ; Mitra, Sabyasachi ; College of ComputingThis paper addresses a scalability problem in the process of synchronizing the states of multiple client databases which only have deferred access to the server. It turns out that the process of client update file generation is not scalable with the number of clients served. In this paper we concentrate on developing an optimization model to address the scalability problem at the server by aiming for an optimal grouping of data fragments at the server given the "interest sets" of the clients - the set of fragments the client deals with for its"local" processing. The objective is to minimize the total cost of server operation which includes processing updates from all clients and transmission cost of sending the right set of updates to each client based on the client's interest set. An integer programming formulation is developed and solved with an illustrative problem, yielding interesting results.
ItemAdaptive and Automated Index Selection in Relational DBMS(Georgia Institute of Technology, 1994) Omiecinski, Edward ; Navathe, Shamkant B. ; Frank, Martin Robert ; College of ComputingWe present a novel approach for a tool that assists the database administrator in designing an index configuration for a relational database system. A new methodology for collecting usage statistics at run time is developed which lets the optimizer estimate query execution costs for alternative index configurations. Defining the workload specification required by existing index design tools may be very complex for a large integrated database system. Our tool automatically derives the workload statistics. These statistics are then used to efficiently compute an index configuration. Execution of a prototype of the tool against a sample database demonstrates that the proposed index configuration is reasonably close to the optimum for test query sets.
ItemThe Impact Of Data Placement Strategies On Reorganization Costs In Parallel Databases(Georgia Institute of Technology, 1995) Omiecinski, Edward ; Navathe, Shamkant B. ; Achyutuni, Kiran Jyotsna ; College of ComputingIn this paper, we study the data placement problem from a reorganization point of view. Effective placement of the declustered fragments of a relation is crucial to the performance of parallel database systems having multiple disks. Given the dynamic nature of database systems, the optimal placement of fragments will change over time and this will necessitate a reorganization in order to maintain the performance of the database system at acceptable levels. This study shows that the choice of a data placement strategy can have a significant impact on the reorganization costs. Up until now, data placement heuristics were designed with the principal purpose of balancing the load. However, this paper shows that such a policy can be beneficial only in the short term. Long term database designs should take reorganization costs into consideration while making design choices.
ItemQuerying, Navigating and Visualizing an Online Library Catalog(Georgia Institute of Technology, 1995) Veerasamy, Aravindan ; Hudson, Scott E. ; Navathe, Shamkant B. ; GVU CenterWe describe the design of an User Interface for a ranked output Information Retrieval system that integrates querying, navigation and visualization in a seamless fashion. Highlights of the system include the following: -- Using a visualization scheme, the interface provides visual feedback to the user about how the query words influence the ranking of retrieved documents. -- By simple drag-and-drop operations of objects on the screen, the interface facilitates a naive end-user in constructing complex structured queries and in providing relevance feedback. -- To suit the evolving information needs of the user, the interface supports navigational features such as browsing documents by specific authors and browsing the Table of Contents of publications. -- The interface integrates an online thesaurus which provides words related to the query that can be used by the user to expand the original query. By providing a rich set of features, the interface coherently supports a wide spectrum of information gathering tactics for different classes of users.