Navathe, Shamkant B.

Associated Organization(s)
Organizational Unit
ArchiveSpace Name Record

Publication Search Results

Now showing 1 - 6 of 6
  • Item
    Minimizing Redundant Work in Lazily Updated Replicated Databases
    (Georgia Institute of Technology, 2000) Omiecinski, Edward ; Navathe, Shamkant B. ; Yee, Wai Gen
    Modern databases which manage lazy (or deferred updates) to clients which subscribe to replicated data do so on a client-by-client basis. They ignore any redundant work done during update processing caused by the commonality in client subscriptions to replicas. This paper proposes a new way to process updates which minimizes this redundancy and results in a reduction of update processing cost at the server in terms of disk space and time consumed in this phase. Ultimately, updates are available quicker, and duration during which clients must enduring stale data is reduced. Results of studies involving, iMobile, a currently available system, are reported, and are extremely encouraging.
  • Item
    A Clustering Algorithm to Discover Low and High Density Hyper-Rectangles in Subspaces of Multidimensional Data.
    (Georgia Institute of Technology, 1999) Omiecinski, Edward ; Navathe, Shamkant B. ; Ezquerra, Norberto F. ; Ordońẽz, Carlos
    This paper presents a clustering algorithm to discover low and high density regions in subspaces of multidimensional data for Data Mining applications. High density regions generally refer to typical cases, whereas low density regions indicate infrequent and thus rare cases. For typical applications there is a large number of low density regions and a few of these are interesting. Regions are considered interesting when they have a minimum "volume" and involve some maximum number of dimensions. Our algorithm discovers high density regions (clusters) and low density regions (outliers, negative clusters, holes, empty regions) at the same time. In particular, our algorithm can find empty regions; that is, regions having no data points. The proposed algorithm is fast and simple. There is a large variety of applications in medicine, marketing, astronomy, finance, etc, where interesting and exceptional cases correspond to the low and high density regions discovered by our algorithm.
  • Item
    A Greedy Approach For Improving Update Processing In Intermittently Synchronized Databases
    (Georgia Institute of Technology, 1999) Omiecinski, Edward ; Navathe, Shamkant B. ; Ammar, Mostafa H. ; Donahoo, Michael J. ; Malik, Sanjoy ; Yee, Wai Gen
    Replication of data on portable computers is a new DBMS technology aimed at catering to a growing population of mobile database users. Clients can download data items such as email, or sales data from a server onto these machines, per use it during commutes, and return any modifications to the server at the end of the day. In this paper, we describe how the servers in these systems generally process update information for clients and reveal a scalability problem--server processing increases quadratically with respect to increasing numbers of clients. We develop a cost model, and propose a solution based on heuristics. By aggregating client interests into datagroups, based on notions such as interest overlap, we can reduce server cost. These techniques are attractive because they are simple and computationally cheap. Simulations show that even simple techniques may yield significant performance improvements.
  • Item
    An Efficient Algorithm for Mining Association Rules in Large Databases
    (Georgia Institute of Technology, 1995) Omiecinski, Edward ; Navathe, Shamkant B. ; Savasere, Ashok
    Mining for association rules between items in a large database of sales transactions has been described as an important database mining problem. In this paper we present an efficient algorithm for mining association rules that is fundamentally different from known algorithms. Compared to the previous algorithms, our algorithm reduces both CPU and I/O overheads. In our experimental study it was found that for large databases, the CPU overhead was reduced by as much as a factor of seven and I/O was reduced by almost an order of magnitude. Hence this algorithm is especially suitable for very large size databases. The algorithm is also ideally suited for parallelization. We have performed extensive experiments and compared the performance of the algorithm with one of the best existing algorithms.
  • Item
    The Impact Of Data Placement Strategies On Reorganization Costs In Parallel Databases
    (Georgia Institute of Technology, 1995) Omiecinski, Edward ; Navathe, Shamkant B. ; Achyutuni, Kiran Jyotsna
    In this paper, we study the data placement problem from a reorganization point of view. Effective placement of the declustered fragments of a relation is crucial to the performance of parallel database systems having multiple disks. Given the dynamic nature of database systems, the optimal placement of fragments will change over time and this will necessitate a reorganization in order to maintain the performance of the database system at acceptable levels. This study shows that the choice of a data placement strategy can have a significant impact on the reorganization costs. Up until now, data placement heuristics were designed with the principal purpose of balancing the load. However, this paper shows that such a policy can be beneficial only in the short term. Long term database designs should take reorganization costs into consideration while making design choices.
  • Item
    Adaptive and Automated Index Selection in Relational DBMS
    (Georgia Institute of Technology, 1994) Omiecinski, Edward ; Navathe, Shamkant B. ; Frank, Martin Robert
    We present a novel approach for a tool that assists the database administrator in designing an index configuration for a relational database system. A new methodology for collecting usage statistics at run time is developed which lets the optimizer estimate query execution costs for alternative index configurations. Defining the workload specification required by existing index design tools may be very complex for a large integrated database system. Our tool automatically derives the workload statistics. These statistics are then used to efficiently compute an index configuration. Execution of a prototype of the tool against a sample database demonstrates that the proposed index configuration is reasonably close to the optimum for test query sets.