Navathe, Shamkant B.
ArchiveSpace Name Record
Publication Search Results
Now showing 1 - 10 of 14
ItemTowards Transactional Data Management over the Cloud(Georgia Institute of Technology, 2010) Tiwari, Rohan G. ; Navathe, Shamkant B. ; Kulkarni, Gaurav J.We propose a consistency model for a data store in the Cloud and work towards the goal of deploying Database as a Service over the Cloud. This includes consistency across the data partitions and consistency of any replicas that exist across different nodes in the system. We target applications which need stronger consistency guarantees than the applications currently supported by the data stores on the Cloud. We propose a cost-effective algorithm that ensures distributed consistency of data without really compromising on availability for fully replicated data. This paper describes a design in progress, presents the consistency and recovery algorithms for relational data, highlights the guarantees provided by the system and presents future research challenges. We believe that the current notions of consistency for databases might not be applicable over the Cloud and a new formulation of the consistency concept may be needed keeping in mind the application classes we aim to support.
ItemMinimizing Redundant Work in Lazily Updated Replicated Databases(Georgia Institute of Technology, 2000) Omiecinski, Edward ; Navathe, Shamkant B. ; Yee, Wai GenModern databases which manage lazy (or deferred updates) to clients which subscribe to replicated data do so on a client-by-client basis. They ignore any redundant work done during update processing caused by the commonality in client subscriptions to replicas. This paper proposes a new way to process updates which minimizes this redundancy and results in a reduction of update processing cost at the server in terms of disk space and time consumed in this phase. Ultimately, updates are available quicker, and duration during which clients must enduring stale data is reduced. Results of studies involving, iMobile, a currently available system, are reported, and are extremely encouraging.
ItemA Mathematical Optimization Approach To Improve Server Scalability In Intermittently Synchronized Databases(Georgia Institute of Technology, 1999) Yee, Wai Gen ; Navathe, Shamkant B. ; Datta, Anindya ; Mitra, SabyasachiThis paper addresses a scalability problem in the process of synchronizing the states of multiple client databases which only have deferred access to the server. It turns out that the process of client update file generation is not scalable with the number of clients served. In this paper we concentrate on developing an optimization model to address the scalability problem at the server by aiming for an optimal grouping of data fragments at the server given the "interest sets" of the clients - the set of fragments the client deals with for its"local" processing. The objective is to minimize the total cost of server operation which includes processing updates from all clients and transmission cost of sending the right set of updates to each client based on the client's interest set. An integer programming formulation is developed and solved with an illustrative problem, yielding interesting results.
ItemToward A Method of Grouping Server Data Fragments for Improving Scalability in Intermittently Synchronized Databases(Georgia Institute of Technology, 1999) Yee, Wai Gen ; Donahoo, Michael J. ; Navathe, Shamkant B.We consider the class of mobile computing applications with periodically connected clients. These clients wish to share data; however, due to the expense of mobile communication, they only connect periodically -- and not necessarily synchronously -- to a common network. Traditionally, a continuously-connected server, containing an aggregate of client data, facilitates sharing amongst clients by allowing the clients to upload local updates and download updates submitted by other clients. The server computes and transmits these updates on a client-by-client basis; consequently, the complexity of these operations is on the order of the number of clients, limiting scalability. Recent research proposes exploiting client data overlap by grouping updates according to how the data is shared amongst clients (data-centric) instead of on a client-by-client basis (client-centric). Each client downloads updates for the relevant set of groups. By grouping, update operation distribution is computed only once per group, irrespective of the number of clients downloading a particular group's updates. Additionally, we may gain bandwidth scalability by employing broadcast delivery since, unlike the case in the per-client approach, multiple clients may be interested in a group's updates. Clearly, group composition directly affects the scalability of this approach. Given a relative cost of resources such as server processing, bandwidth, and storage space, we focus on developing a group derivation approach that significantly improves the scalability of the resources. We construct a formal specification of this problem and discuss the intractability of an optimal solution. Based on observations from the specification, we derive a heuristically based approach and evaluate its efficacy with respect to the client-centric approach. We run experiments on an implemented system that demonstrates that as the amount of overlap increases between client subscriptions, the data-centric approach with groups generated by our heuristic-based algorithm yields significant cost reduction when compared to the traditional client-centric approach.
ItemA Clustering Algorithm to Discover Low and High Density Hyper-Rectangles in Subspaces of Multidimensional Data.(Georgia Institute of Technology, 1999) Omiecinski, Edward ; Navathe, Shamkant B. ; Ezquerra, Norberto F. ; Ordońẽz, CarlosThis paper presents a clustering algorithm to discover low and high density regions in subspaces of multidimensional data for Data Mining applications. High density regions generally refer to typical cases, whereas low density regions indicate infrequent and thus rare cases. For typical applications there is a large number of low density regions and a few of these are interesting. Regions are considered interesting when they have a minimum "volume" and involve some maximum number of dimensions. Our algorithm discovers high density regions (clusters) and low density regions (outliers, negative clusters, holes, empty regions) at the same time. In particular, our algorithm can find empty regions; that is, regions having no data points. The proposed algorithm is fast and simple. There is a large variety of applications in medicine, marketing, astronomy, finance, etc, where interesting and exceptional cases correspond to the low and high density regions discovered by our algorithm.
ItemA Greedy Approach For Improving Update Processing In Intermittently Synchronized Databases(Georgia Institute of Technology, 1999) Omiecinski, Edward ; Navathe, Shamkant B. ; Ammar, Mostafa H. ; Donahoo, Michael J. ; Malik, Sanjoy ; Yee, Wai GenReplication of data on portable computers is a new DBMS technology aimed at catering to a growing population of mobile database users. Clients can download data items such as email, or sales data from a server onto these machines, per use it during commutes, and return any modifications to the server at the end of the day. In this paper, we describe how the servers in these systems generally process update information for clients and reveal a scalability problem--server processing increases quadratically with respect to increasing numbers of clients. We develop a cost model, and propose a solution based on heuristics. By aggregating client interests into datagroups, based on notions such as interest overlap, we can reduce server cost. These techniques are attractive because they are simple and computationally cheap. Simulations show that even simple techniques may yield significant performance improvements.
ItemA knowledge-based approach to integrating and querying distributed heterogeneous information systems(Georgia Institute of Technology, 1995) Navathe, Shamkant B.
ItemAn Efficient Algorithm for Mining Association Rules in Large Databases(Georgia Institute of Technology, 1995) Omiecinski, Edward ; Navathe, Shamkant B. ; Savasere, AshokMining for association rules between items in a large database of sales transactions has been described as an important database mining problem. In this paper we present an efficient algorithm for mining association rules that is fundamentally different from known algorithms. Compared to the previous algorithms, our algorithm reduces both CPU and I/O overheads. In our experimental study it was found that for large databases, the CPU overhead was reduced by as much as a factor of seven and I/O was reduced by almost an order of magnitude. Hence this algorithm is especially suitable for very large size databases. The algorithm is also ideally suited for parallelization. We have performed extensive experiments and compared the performance of the algorithm with one of the best existing algorithms.
ItemThe Impact Of Data Placement Strategies On Reorganization Costs In Parallel Databases(Georgia Institute of Technology, 1995) Omiecinski, Edward ; Navathe, Shamkant B. ; Achyutuni, Kiran JyotsnaIn this paper, we study the data placement problem from a reorganization point of view. Effective placement of the declustered fragments of a relation is crucial to the performance of parallel database systems having multiple disks. Given the dynamic nature of database systems, the optimal placement of fragments will change over time and this will necessitate a reorganization in order to maintain the performance of the database system at acceptable levels. This study shows that the choice of a data placement strategy can have a significant impact on the reorganization costs. Up until now, data placement heuristics were designed with the principal purpose of balancing the load. However, this paper shows that such a policy can be beneficial only in the short term. Long term database designs should take reorganization costs into consideration while making design choices.
ItemQuerying, Navigating and Visualizing an Online Library Catalog(Georgia Institute of Technology, 1995) Veerasamy, Aravindan ; Hudson, Scott E. ; Navathe, Shamkant B.We describe the design of an User Interface for a ranked output Information Retrieval system that integrates querying, navigation and visualization in a seamless fashion. Highlights of the system include the following: -- Using a visualization scheme, the interface provides visual feedback to the user about how the query words influence the ranking of retrieved documents. -- By simple drag-and-drop operations of objects on the screen, the interface facilitates a naive end-user in constructing complex structured queries and in providing relevance feedback. -- To suit the evolving information needs of the user, the interface supports navigational features such as browsing documents by specific authors and browsing the Table of Contents of publications. -- The interface integrates an online thesaurus which provides words related to the query that can be used by the user to expand the original query. By providing a rich set of features, the interface coherently supports a wide spectrum of information gathering tactics for different classes of users.