Person:
Mooney, Vincent John, III

Associated Organization(s)
ORCID
ArchiveSpace Name Record

Publication Search Results

Now showing 1 - 6 of 6
  • Item
    Atalanta: A New Multiprocessor RTOS Kernel for System-on-a-Chip Applications
    (Georgia Institute of Technology, 2002) Sun, Di-Shi ; Blough, Douglas M. ; Mooney, Vincent John, III
    This paper introduces a new multiprocessor real-time operating system (RTOS) kernel that is designed as a software platform for System-On-Chip (SoC) applications and hardware/software codesign research purposes. This multiprocessor RTOS kernel has the key features of an RTOS, such as multitasking capabilities, event-driven priority-based preemptive scheduling; and interprocess communication and synchronization. Atalanta has some features important for SoC applications, such as a small, compact, deterministic, modular and library-based architecture. Atalanta also supports some special features such as priority inheritance, and user configurability. Atalanta supports multiple processors with synchronization based on atomic read-modify-write operations.
  • Item
    Round-robin Arbiter Design and Generation
    (Georgia Institute of Technology, 2002) Shin, Eung Seo ; Mooney, Vincent John, III ; Riley, George F.
    In this paper, we introduce a Roundrobin Arbiter Generator (RAG) tool. The RAG tool can generate a design for a Bus Arbiter (BA). The BA is able to handle the exact number of bus masters for both on-chip and off-chip buses specified by the user of RAG. RAG can also generate a distributed and parallel hierarchical Switch Arbiter (SA). The first contribution of this paper is the automated generation of a round-robin token passing BA to reduce time spent on arbiter design. The generated arbiter is fair, fast, and has a low and predictable worst-case wait time. The second contribution of this paper is the design and integration of a distributed fast arbiter, e.g., for a terabit switch, based on 4x4 and 2x2 switch arbiters (SAs). Using a .25 TSMC standard cell library from LEDA Systems [11, 15], we show the arbitration time of a 256x256 SA for a terabit switch and demonstrate that the SA generated by RAG meets the time constraint to achieve approximately six terabits of throughput in a typical network switch design. Furthermore, our generated SA performs better than the Ping-Pong Arbiter and Programmable Priority Encoder by a factor of 1.9X and 2.4X, respectively.
  • Item
    Automated Bus Generation for Multiprocessor SoC Design
    (Georgia Institute of Technology, 2002) Ryu, Kyeong Keol ; Mooney, Vincent John, III
    The performance of a system, especially a multiprocessor system, heavily depends upon the efficiency of its bus architecture. This paper presents a methodology to generate a custom bus system for a multiprocessor System-on-a-Chip (SoC). Our bus synthesis tool (BusSyn) uses this methodology to generate five different bus systems as examples: Bi-FIFO Bus Architecture (BFBA), Global Bus Architecture Version I (GBAVI), Global Bus Architecture Version III (GBAVIII), Hybrid bus architecture (Hybrid) and Split Bus Architecture (SplitBA). We verify and evaluate the performance of each bus system in the context of three applications: an Orthogonal Frequency Division Multiplexing (OFDM) wireless transmitter, an MPEG2 decoder and a database example. This methodology gives the designer a great benefit in fast design space exploration of bus architectures across a variety of performance impacting factors such as bus types, processor types and software programming style. In this paper, we show that BusSyn can generate buses that achieve superior performance when compared to a simple General Global Bus Architecture (GGBA) (e.g., 41% reduction in execution time in the case of a database example). In addition, the bus architecture generated by BusSyn is designed in a matter of seconds instead of weeks for the hand design of a custom bus system.
  • Item
    Instruction-level Reverse Execution for Debugging
    (Georgia Institute of Technology, 2002) Akgul, Tankut ; Mooney, Vincent John, III
    Reverse execution provides a programmer with the ability to return a program to a previous state in its execution history. The ability to execute a program in reverse is advantageous for shortening software development time. Conventional techniques for reverse execution rely on saving a state into a record before the state is destroyed. State saving introduces both memory and time overheads during forward execution. Our proposed method introduces a reverse execution methodology at the assembly instruction level with low memory and time overheads. The methodology generates from a program a reverse program by which a destroyed state is almost always regenerated rather than being restored from a record. This significantly reduces state saving. The methodology has been implemented on a PowerPC processor with a custom made debugger. As compared to previous work using state saving techniques, the experimental results show 2.5X to 400X memory overhead reduction for the tested benchmarks. Furthermore, the results with the same benchmarks show an average of 4.1X to 5.7X reduction in execution time overhead.
  • Item
    An Access based Energy Model for the Datapath and Memory Hierarchy of HPL-PD Microarchitecture in Trimaran Framework (TRIREME)
    (Georgia Institute of Technology, 2002) Banakar, Rajeshwari ; Ekpanyapong, Mongkol ; Puttaswamy, Kiran ; Rabbah, Rodric Michel ; Balakrishnan, M. ; Mooney, Vincent John, III ; Palem, Krishna V.
    In this paper a system level energy model called TRIREME, is presented for HPL-PD microarchitecture which is used in Trimaran Compiler framework studies. The number of accesses for the various computational units are obtained from the trimaran framework, which gives the performance estimates also. We focus on the details of the HPL PD at the microarchitectural level and how the energy models for the processor are constructed. Our system level energy model can be used in Trimaran framework for energy-aware computing, to validate the compiler techniques for the benefits of energy saving due to introduction of a specific compiler optimization.
  • Item
    Power Optimization of Embedded Memory Systems via Data Remapping
    (Georgia Institute of Technology, 2002) Palem, Krishna V. ; Rabbah, Rodric Michel ; Mooney, Vincent John, III ; Korkmaz, Pinar ; Puttaswamy, Kiran
    In this paper, we provide a novel compile-time data remapping algorithm that runs in linear time. This remapping algorithm is the first fully automatic approach applicable to pointer-intensive dynamic applications. We show that data remapping can be used to significantly reduce the energy consumed as well as the memory size needed to meet a user-specified performance goal (i.e., execution time) -- relative to the same application executing without being remapped. These twin advantages afforded by a remapped program -- improved cache and energy needs -- constitute a key step in a framework for design space exploration: for any given performance goal, remapping allows the user to reduce the primary and secondary cache size by 50%, yielding a concomitant energy savings of 57%. Additionally, viewed as a compiler optimization for a fixed processor, we show that remapping improves the energy consumed by the cache subsystem by 25%. All of the above savings are in the context of the cache subsystem in isolation. We also show that remapping yields an average 20% energy saving for an ARM-like processor and cache subsystem. All of our improvements are achieved in the context of DIS, OLDEN and SPEC2000 pointer-centric benchmarks.