CERCS Technical Report Series

Series Type
Publication Series
Associated Organization(s)
Associated Organization(s)

Publication Search Results

Now showing 1 - 10 of 193
  • Item
    Shared Address Space I/O: A Novel I/O Approach for System-on-a-Chip Networking
    (Georgia Institute of Technology, 2004-02-19) Sun, Di-Shi ; Blough, Douglas M.
    For real-time system-on-a-chip (SoC) network applications, high-speed and low-latency network I/O is the key to achieve predictable execution and high performance. Existing network I/O approaches are either not directly suited to SoC applications, or too complicated and expensive. This paper introduces a novel approach, referred to as shared address space I/O, for real-time SoC network applications. This approach facilitates building of heterogeneous multiprocessor systems with application intensive processors (main processor) and I/O intensive processors (I/O processor), where network I/O processing can be offloaded to a specialized I/O processor. With the shared address space I/O approach, in such a system, communication and synchronization between main and I/O processors can be implemented with a shared address space. This approach is realized through Atalanta, a heterogeneous real-time SoC operating system we have developed. In this paper, we demonstrate that shared address space I/O can provide high-speed and low-latency network I/O for SoC network applications.
  • Item
    µsik -- A Micro Kernel for Parallel/Distributed Simulation
    (Georgia Institute of Technology, 2004-05-26) Perumalla, Kalyan S.
    We present a novel micro-kernel approach to parallel/distributed simulation. Using the micro-kernel approach, we develop a unified architecture for incorporating multiple types of simulation processes. The processes hold potential to employ a variety of synchronization mechanisms, and could alter their choice of mechanism dynamically. Supported mechanisms include traditional lookahead-based conservative and state saving-based optimistic execution approaches, as well as newer mechanisms such as reverse computation-based optimistic execution and aggregation-based event processing, all within a single parsimonious application programming interface (API). We also present the internal implementation and a preliminary performance evaluation of this interface in μsik, which is an efficient parallel/distributed realization of our micro-kernel architecture in C++.
  • Item
    Using Hierarchies for Optimizing Distributed Stream Queries
    (Georgia Institute of Technology, 2006) Seshadri, Sangeetha ; Kumar, Vibhore ; Cooper, Brian F.
    We consider the problem of query optimization in distributed data stream systems where multiple continuous queries may be executing simultaneously. In order to achieve the best performance, query planning (such as join ordering) must be considered in conjunction with deployment planning (e.g., assigning operators to physical nodes). In our scenario, the large number of network nodes, query operators, and opportunities for operator sharing between queries means that brute force and traditional techniques are too expensive. We propose two algorithms - the Bottom-Up algorithm and the Top-Down algorithm, which utilize hierarchical network partitions to provide scalable query optimization. We present analysis that establishes the bounds on the search-space and sub-optimality achieved by our algorithms. Finally, through simulations and experiments using a prototype deployed on Emulab we demonstrate the effectiveness of our algorithms. The Top-Down algorithm, for instance, was able to achieve, on an average, solutions that were sub-optimal by only 10% while considering less than 1% of the search space.
  • Item
    Thermal-aware 3D Microarchitectural Floorplanning
    (Georgia Institute of Technology, 2004) Ekpanyapong, Mongkol ; Healy, Michael ; Ballapuram, Chinnakrishnan S. ; Lim, Sung Kyu ; Lee, Hsien-Hsin Sean ; Loh, Gabriel H.
    Next generation deep submicron processor design will need to take into consideration many performance limiting factors. Flip flops are inserted in order to prevent global wire delay from becoming nonlinear, enabling deeper pipelines and higher clock frequency. The move to 3D ICs will also likely be used to further shorten wirelength. This will cause thermal issues to become a major bottleneck to performance improvement. In this paper we propose a floorplanning algorithm which takes into consideration both thermal issues and profile weighted wirelength using mathematical programming. Our profile-driven objective improves performance by 20% over wirelength-driven. While the thermal-driven objective improves temperature by 24% on average over the profile-driven case.
  • Item
    PreDatA - Preparatory Data Analytics on Peta-Scale Machines
    (Georgia Institute of Technology, 2010) Zheng, Fang ; Abbasi, Hasan ; Docan, Ciprian ; Lofstead, Jay ; Klasky, Scott ; Liu, Qing ; Parashar, Manish ; Podhorszki, Norbert ; Schwan, Karsten ; Wolf, Matthew ; Georgia Institute of Technology. College of Computing ; Georgia Institute of Technology. Center for Experimental Research in Computer Systems ; Rutgers University. Center for Autonomic Computing ; Oak Ridge National Laboratory
    Peta-scale scientific applications running on High End Computing (HEC) platforms can generate large volumes of data. For high performance storage and in order to be useful to science end users, such data must be organized in its layout, indexed, sorted, and otherwise manipulated for subsequent data presentation, visualization, and detailed analysis. In addition, scientists desire to gain insights into selected data characteristics ‘hidden’ or ‘latent’ in the massive datasets while data is being produced by simulations. PreDatA, short for Preparatory Data Analytics, is an approach for preparing and characterizing data while it is being produced by the large scale simulations running on peta-scale machines. By dedicating additional compute nodes on the peta-scale machine as staging nodes and staging simulation’s output data through these nodes, PreDatA can exploit their computational power to perform selected data manipulations with lower latency than attainable by first moving data into file systems and storage. Such in-transit manipulations are supported by the PreDatA middleware through RDMAbased data movement to reduce write latency, application-specific operations on streaming data that are able to discover latent data characteristics, and appropriate data reorganization and metadata annotation to speed up subsequent data access. As a result, PreDatA enhances the scalability and flexibility of current I/O stack on HEC platforms and is useful for data pre-processing, runtime data analysis and inspection, as well as for data exchange between concurrently running simulation models. Performance evaluations with several production peta-scale applications on Oak Ridge National Laboratory’s Leadership Computing Facility demonstrate the feasibility and advantages of the PreDatA approach.
  • Item
    A Hybrid Access Model for Storage Area Networks
    (Georgia Institute of Technology, 2004) Singh, Aameek ; Voruganti, Kaladhar ; Gopisetty, Sandeep ; Pease, David ; Liu, Ling
    We present HSAN - a hybrid storage area network, which uses both in-band (like NFS) and out-of-band virtualization (like SAN FS) access models. Using hybrid servers that can serve as both metadata and NAS servers, HSAN intelligently decides the access model per each request, based on the characteristics of requested data. This hybrid model is implemented using low overhead cache-admission and cache-replacement schemes and aims to improve overall response times for a wide variety of workloads. Preliminary analysis of the hybrid model indicates performance improvements over both models.
  • Item
    POD: A Parallel-On-Die Architecture
    (Georgia Institute of Technology, 2007-05) Woo, Dong Hyuk ; Fryman, Joshua Bruce ; Knies, Allan D. ; Eng, Marsha ; Lee, Hsien-Hsin Sean
    As power constraints, complexity and design verification cost make it difficult to improve single-stream performance, parallel computing paradigm is taking a place amongst mainstream high-volume architectures. Most current commercial designs focus on MIMD-style CMPs built with rather complex single cores. While such designs provide a degree of generality, they may not be the most efficient way to build processors for applications with inherently scalable parallelism. These designs have been proven to work well for certain classes of applications such as transaction processing, but they have driven the development of new languages and complex architectural features. Instead of building MIMD-CMPs for all workloads, we propose an alternative parallel on-die many-core architecture called POD based on a large SIMD PE array. POD helps to address the key challenges of on-chip communication bandwidth, area limitations, and energy consumed by routers by factoring out features necessary for MIMD machines and focusing on architectures that match many scalable workloads. In this paper, we evaluate and quantify the advantages of the POD architecture based its ISA on a commercially relevant CISC architecture and show that it can be as efficient as more specialized array processors based on one-off ISAs. Our single-chip POD is capable of best-in-class scalar performance up to 1.5 TFLOPS of single-precision floating-point arithmetic. Our experimental results show that in some application domains, our architecture can achieve nearly linear speedup on a large number of SIMD PEs, and this speedup is much bigger than the maximum speedup that MIMD-CMPs on the same die size can achieve. Furthermore, owing to synchronized computation and communication, it shows that POD can efficiently suppress energy consumption on the novel communication method in our interconnection network.
  • Item
    KStreams: Kernel Support for Efficient End-to-End Data Streaming
    (Georgia Institute of Technology, 2004) Kong, Jiantao ; Schwan, Karsten
    Technology advances are enabling increasingly data-intensive applications, ranging from peer-to-peer file sharing, to multimedia, to remote graphics and data visualization. One outcome is the considerable memory pressure imposed on the machines involved, caused by application-specific data movements and by repeated crossings of user/kernel boundaries. We address this problem with a novel system service, termed KStreams, a general facility for manipulating data without using intermediate buffers when it moves across multiple kernel objects, like files or sockets. KStreams may be used to implement kernel-level services that range from application-specific implementations of sendfile commands, to data mirroring or proxy functions, to fast path data conversions and transformations for data streaming. The KStreams API permits individual applications to define fast path operations, which will then execute at kernel level and if desired, without further application involvement. By placing application-specific data manipulations into data movement fast paths, user/kernel boundary crossings are avoided. By operating on data streams `in-flight', data buffering is made unnecessary, thereby further reducing the memory pressure imposed on machines. KStreams is implemented on Linux kernel version 2.4.22. Its evaluation uses data-intensive tasks performed in conjunction with modern web services, such as proxy functions, remote media streaming, data visualization, etc. Initial experiences with the KStreams implementation are encouraging. Fast path data transformation via KStreams results in increased throughput of 20-50% compared to user-level data manipulations. Future work with KStreams uses it with complex multi-machine web services, evaluated with representative user loads and applications.
  • Item
    Using Byzantine Quorum Systems to Manage Confidential Data
    (Georgia Institute of Technology, 2004-04-01) Subbiah, Arun ; Ahamad, Mustaque ; Blough, Douglas M.
    This paper addresses the problem of using proactive cryptosystems for generic data storage and retrieval. Proactive cryptosystems provide high security and confidentiality guarantees for stored data, and are capable of withstanding attacks that may compromise all the servers in the system over time. However, proactive cryptosystems are unsuitable for generic data storage uses for two reasons. First, proactive cryptosystems are usually used to store keys, which are rarely updated. On the other hand, generic data could be actively written and read. The system must therefore be highly available for both write and read operations. Second, existing share renewal protocols (the critical element to achieve proactive security) are expensive in terms of computation and communication overheads, and are time consuming operations. Since generic data will be voluminous, the share renewal process will consume substantial system resources and cause a significant amount of system downtime. Two schemes are proposed that combine Byzantine quorum systems and proactive secret sharing techniques to provide high availability and security guarantees for stored data, while reducing the overhead incurred during the share renewal process. Several performance metrics that can be used to evaluate proactively-secure generic data storage schemes are identified. The proposed schemes are thus shown to render proactive systems suitable for confidential generic data storage.
  • Item
    Autonomic Information Flows
    (Georgia Institute of Technology, 2005) Schwan, Karsten ; Cooper, Brian F. ; Eisenhauer, Greg S. ; Gavrilovska, Ada ; Wolf, Matthew ; Abbasi, Hasan ; Agarwala, Sandip ; Cai, Zhongtang ; Kumar, Vibhore ; Lofstead, Jay ; Mansour, Mohamed S. ; Seshasayee, Balasubramanian ; Widener, Patrick M. (Patrick McCall)
    Today's enterprise systems and applications implement functionality that is critical to the ability of society to function. These complex distributed applications, therefore, must meet dynamic criticality objectives even when running on shared heterogeneous and dynamic computational and communication infrastructures. Focusing on the broad class of applications structured as distributed information flows, the premise of our research is that it is difficult, if not impossible, to meet their dynamic service requirements unless these applications exhibit autonomic or self-adjusting behaviors that are `vertically' integrated with underlying distributed systems and hardware. Namely, their autonomic functionality should extend beyond the dynamic load balancing or request routing explored in current web-based software infrastructures to (1) exploit the ability of middleware or systems to be aware of underlying resource availabilities, (2) dynamically and jointly adjust the behaviors of interacting elements of the software stack being used, and even (3) dynamically extend distributed platforms with enterprise functionality (e.g., network-level business rules for data routing and distribution). The resulting vertically integrated systems can meet stringent criticality or performance requirements, reduce potentially conflicting behaviors across applications, middleware, systems, and resources, and prevent breaches of the `performance firewalls' that isolate critical from non-critical applications. This paper uses representative information flow applications to argue the importance of vertical integration for meeting criticality requirements. This is followed by a description of the AutoFlow middleware, which offers methods that drive the control of application services with runtime knowledge of current resource behavior. Finally, we demonstrate the opportunities derived from the additional ability of AutoFlow to enhance such methods by also dynamically extending and controlling the underlying software stack, first to better understand its behavior and second, to dynamically customize it to better meet current criticality requirements.