Person:

Schwan, Karsten

Permanent Link

https://hdl.handle.net/1853/71682

Associated Organization(s)

Organizational Unit

College of Computing

Full item page

Publication Search Results

Now showing 1 - 10 of 56

XCHANGE: High Performance Data Morphing in Distributed Applications

(Georgia Institute of Technology, 2005) Lofstead, Jay ; Schwan, Karsten

Distributed applications in which large volumes of data are exchanged between components that generate, process, and store or display data are common in both the high performance and enterprise domains. A key issue in these domains is mismatches of the data being generated with the data required by end users or by intermediate components. Mismatches are due to the need to customize or personalize data for certain end users, or they arise from natural differences in the data representations used by different components. In either case, mismatch correction – data morphing -- requires either servers or clients to perform extensive data processing. This paper describes automated methods and associated tools for morphing data in overlay networks that connect data producers with consumers. These methods automatically generate data transformation codes from declarative specifications, ‘just in time’, i.e., when and as needed. By describing data transformations declaratively, code generation can take into account the current nature of the data being generated, the current needs of data sinks, and the current resources available in the overlay connecting sources to sinks. In addition, code generation can consider the shared requirements of multiple consumers, to reduce redundant data transmissions and transformations. Data morphing is realized with the XCHANGE toolset, and in this paper, it is applied to both high performance and enterprise applications. Runtime generation and deployment of data morphing codes for filtering and transforming the large data volumes exchanged in a high performance remote data visualization is shown to result in improved network usage, able to generate code that matches the data volumes exchanged to available network resources. Morphing codes dynamically generated and deployed for an enterprise application in the healthcare domain demonstrates the importance of generating code so as to improve server scalability by reducing server loads.
Java Mirrors: Building Blocks for Interacting with High Performance Applications

(Georgia Institute of Technology, 2005) Chen, Yuan ; Schwan, Karsten ; Rosen, David W.

Mirror objects are the key building blocks in the virtual 'workbenches' and 'portals' for scientific and engineering applications constructed by our group. This paper uses mirror objects in the implementation of the RTTB design workbench, which controls components of the RTTB rapid tooling and prototyping testbed. Mirror objects continuously mirror the states of remote software or even hardware entities, and the operations performed on mirrors are automatically propagated to these entities. Thus, end users perceive mirrors as virtualizations of remote entities. This paper presents the concept of mirror objects, their JMOSS Java-based implementation, the interoperation of JMOSS Java mirrors with the CORBA-based MOSS mirror object implementation, demonstrations of mirror functionality and utility with a virtual `design workbench' used by engineers for rapid tooling and prototyping processes, and performance evaluations of mirror objects. We also present initial evaluations of JMOSS mirrors in mobile environments, where workbench users can continue their PC-based online interactions via handheld devices carried to the shop floor.
Leveraging Block Decisions and Aggregation in the ShareStreams QoS Architecture

(Georgia Institute of Technology, 2003) Krishnamurthy, Rajaram B. ; Yalamanchili, Sudhakar ; Schwan, Karsten ; West, Richard

ShareStreams (Scalable Hardware Architectures for Stream Schedulers) is a canonical architecture for realizing a range of scheduling disciplines. This paper discusses the design choices and tradeoffs made in the development of a Endsystem/Host-based router realization of the ShareStreams architecture. We evaluate the impact of block decisions and aggregation on the ShareStreams architecture. Using processor resources for queuing and data movement, and FPGA hardware for accelerating stream selection and stream priority updates, ShareStreams can easily meet the wire-speeds of 10Gbps links. This allows provision of customized scheduling solutions and interoperability of scheduling disciplines. FPGA hardware uses a single-cycle Decision block to compare multiple stream attributes simultaneously for pairwise ordering and a Decision block arrangement in a recirculating network to conserve area and improve scalability. Our hardware implemented in the Xilinx Virtex family easily scales from 4 to 32 stream-slots on a single chip. A running FPGA prototype in a PCI card under systems software control can provide scheduling support for a mix of EDF, static-priority and fair-share streams based on user specifications and meet the temporal bounds and packet-time requirements of multi-gigabit links.
A Practical Approach for Zero Downtime in an Operational Information System

(Georgia Institute of Technology, 2002) Gavrilovska, Ada ; Schwan, Karsten ; Oleson, Van

An Operational Information System (OIS) supports a real-time view of an organization's information critical to its logistical business operations. A central component of an OIS is an engine that integrates data events captured from distributed, remote sources in order to derive meaningful real-time views of current operations. This Event Derivation Engine (EDE) continuously updates these views and also publishes them to a potentially large number of remote subscribers. This paper describes a sample OIS and EDE in the context of an airline's operations. It then defines the performance and availability requirements to be met by this system, specifically focusing on the EDE component. One particular requirement for the EDE is that subscribers to its output events should not experience downtime due to EDE failures and crashes or increased processing loads. This paper describes a practical technique for masking failures and for hiding the costs of recovery from EDE subscribers. This technique utilizes redundant EDEs that coordinate view replicas with a relaxed synchronous fault tolerance protocol. Combination of pre- and post-buffering replicas is used to attain an optimal solution, which still prevents system-wide failure in the face of deterministic faults, such as ill-formed messages. By minimizing the amounts of synchronization used across replicas, the resulting zero downtime EDE can be scaled to support the large number of subscribers it must service.
Method Partitioning - Runtime Customization of Pervasive Programs without Design-time Application Knowledge

(Georgia Institute of Technology, 2002) Zhou, Dong ; Pande, Santosh ; Schwan, Karsten

Heterogeneity, decoupling, and dynamics in distributed, component-based applications indicate the need for dynamic program customization and adaptation. Method Partitioning is a dynamic unit placement based technique for customizing performance-critical message-based interactions between program components, at runtime and without the need for design-time application knowledge. The technique partitions message handling functions, and offers high customizability and low-cost adaptation of such partitioning. It consists of (a) static analysis of message handling methods to produce candidate partitioning plans for the methods, (b) cost models for evaluating the cost/benefits of different partitioning plans, (c) a Remote Continuation mechanism that "connects" the distributed parts of a partitioned method at runtime, and (d) Runtime Profiling and Reconfiguration Units that monitor actual costs of candidate partitioning plans and that dynamically select "best" plans from candidates. A prototypical implementation of Method Partitioning the JECho distributed event system is applied to two distributed applications: (1) a communication-bound application running on a wireless-connected mobile platform, and (2) a compute-intensive code mapped to power- and therefore, computationally limited embedded processors. Experiments with method Partitioning demonstrate significant performance improvements for both types of applications, derived from the fine-grain, low overhead adaptation actions applied whenever necessitated by changes in program behavior or environment characteristics.
Optimizing Dynamic Producer/Consumer Style Applications in Embedded Environments

(Georgia Institute of Technology, 2002) Zhou, Dong ; Pande, Santosh ; Schwan, Karsten

Many applications in pervasive computing environments are subject to resource constraints in terms of limited bandwidth and processing power. As such applications grow in scale and complexity, these constraints become increasingly difficult to predict at design and deployment times. Runtime adaptation is hence required for the dynamics in such constraints. However, to maintain the lightweightness of such adaptation it is important to statically gather relevant program information to reduce the runtime overhead of dynamic adaptation. This paper presents methods that use both static program analysis and runtime profiling to support the adaptation of producer/consumer-style pervasive applications. It demonstrates these methods with a network traffic-centric cost model and a program execution time-centric cost model. A communication bandwidth critical application and a computation intensive application are used to demonstrate the significant performance improvement opportunities offered by these methods under the presence of respective resource constraints.
drpoc-Extensible Run-Time Resource Monitoring for Cluster Applications

(Georgia Institute of Technology, 2002) Jancic, Jasmina ; Poellabauer, Christian ; Schwan, Karsten ; Wolf, Matthew ; Bright, Neil

In this paper we describe the dproc (distributed /proc) kernel-level mechanisms and abstractions, which provide the building blocks for the implementation of efficient, cluster-wide, and application-specific performance monitoring. Such monitoring functionality may be constructed at any time, both before and during application invocation, and can include dynamic run-time extensions. This paper (i) presents dproc's implementation in a Linux-based cluster of SMP-machines, and (ii) evaluates its utility by construction of sample monitoring functionality.
RASA (Reconfigurable Architectures for Scheduling Activities) Architecture and Hardware for Scheduling Gigabit Packet Streams

(Georgia Institute of Technology, 2002) Krishnamurthy, Rajaram B. ; Yalamanchili, Sudhakar ; Schwan, Karsten ; West, Richard

We present an architecture and hardware for scheduling gigabit packet streams in server clusters that combines a Network Processor datapath and an FPGA for use in server NICs and server cluster switches. Our architectural framework can provide EDF, static-priority, fair-share and DWCS native scheduling support for best-effort and real-time streams. This allows (i) interoperability of scheduling hardware supporting different scheduling disciplines and (ii) helps in providing customized scheduling solutions in server clusters based on traffic type, stream content, stream volume and cluster hardware using a hardware implementation of a scheduler running at wire-speeds. The architecture scales easily from 4 to 32 streams on a single Xilinx Virtex 1000 chip and can support 64-byte - 1500-byte Ethernet frames on a 1 Gbps link and 1500-byte Ethernet frames on a 10 Gbps link. A running hardware prototype of a stream scheduler in a Virtex 1000 PCI card can divide bandwidth based on user specifications and meet the temporal bounds and packet-time requirements of multi-gigabit links.
Effects of Reconfiguration on Performance in Configurable Operating Systems: Practical Predictability Strategies?

(Georgia Institute of Technology, 2001) Krishnamurthy, Rajaram B. ; Schwan, Karsten

Critical systems must be configured to meet changing functionality, time-criticality and fault-tolerance needs. Configurations may be performed statically at operating system build or boot -time. Dynamic configurations are also possible during run-time, when the operating system is loaded and running. For example, operating sytem kernel modules, operating system components, middleware and application programs may all be configured statically or dynamically. Operating systems may be configured to scale from ROMable versions to full-fledged multiprocessor clusters, which we term, the horizontal configurability feature. In addition, to address different types of applications, operating systems may be configured with enhanced or reduced functionality, which we term the vertical configurability feature. Finally, when such configurations are performed, higher level operating system components, middleware, and other Commercial Off-The-Shelf (COTS) software products expect to experience the gains derived from such configurations in terms of enhanced performance or predictability.
Adaptable Mirroring in Cluster Servers

(Georgia Institute of Technology, 2001) Gavrilovska, Ada ; Schwan, Karsten ; Oleson, Van

This paper presents a software architecture for continuously mirroring streaming data received by one node of a cluster-based server system to other cluster nodes. The intent is to distribute the server loads implied by the data's processing and distribution to large numbers of clients. This is particularly important when a server performs multiple processing tasks, some of which may heavily depend on current system status. One specific example is the preparation of suitable initialization state for thin clients, so that such clients can understand future data events being streamed to them. In particular, when large numbers of thin clients must be initialized at the same time, such initialization must be performed without jeopardizing the quality of service offered to regular clients continuing to receive data streams. The mirroring framework presented and evaluated here has several novel aspects. First, by performing mirroring at the middleware level, application semantics may be used to reduce mirroring traffic: by event filtering based on data type or even data content, by coalescing certain events, or by simply varying mirroring rates according to current application needs concerning the consistencies of mirrored vs. original data. The intent of such dynamically varied mirroring is to improve server scalability, both with respect to its ability to stream data events to a large number of clients and to deal with large and highly variable request volumes from clients that require other services, such as new initial states computed from incoming data events. Second, this paper presents an adaptive algorithm that varies mirror consistency and thereby, mirroring overheads in response to changes in clients' request behavior. The third, novel aspect of this work is the framework's ability to not only mirror events, but to also mirror the new states computed from incoming events, thus enabling dynamic tradeoffs in communication vs. computation loads imposed on the server node receiving events and on its mirror nodes. This framework capability is used for adaptive event coalescing in response to increases or decreases in client request loads. Fourth, mirroring functionality is structured so that it is easily 'split' across nodes' main CPUs and the CoProcessors resident on their programmable network interfaces (should such interfaces exist). A case study of a mirroring design for an Intel I2O-based interface board using an i960 RD CoProcessor is also presented in this paper.