Person:

Mooney, Vincent John, III

Permanent Link

https://hdl.handle.net/1853/71534

Associated Organization(s)

Organizational Unit

School of Electrical and Computer Engineering

Full item page

Publication Search Results

Now showing 1 - 10 of 12

Timing Analysis for Preemptive Multi-tasking Real-Time Systems with Caches

(Georgia Institute of Technology, 2004) Tan, Yudong ; Mooney, Vincent John, III

In this paper, we propose an approach to estimate the Worst Case Response Time (WCRT) of each task in a preemptive multi-tasking single-processor real-time system with an L1 cache. The approach combines inter-task cache eviction analysis and intra-task cache access analysis to estimate the number of cache lines that can possibly be evicted by the preempting task and also be accessed again by the preempted task after preemptions (thus requiring the preempted task to reload the cache line(s)). This cache reload delay caused by preempting tasks is then incorporated into WCRT analysis. Two sets of applications are used to test our approach. Each set of applications contains three tasks. The experimental results show that our approach can tighten the WCRT estimate by 38% (1.6X) to 56% (2.3X) over prior state-of-the-art.
Loss-Tolerant and Secure Embedded Computing via Inscrutable Instruction-Set Architectures (I²SA)

(Georgia Institute of Technology, 2004) Mooney, Vincent John, III ; Palem, Krishna V. ; Wunderlich, Richard B.

In short, we examine secure computing where a microprocessor reads, writes and executes operations ideally in an inscrutable domain. The goal is to provide methods and implementations for computation where data never leave the inscrutable domain in which they reside; the intended effect is twofold. The first intended effect is that any transmissions between computing media using our approach would be unbreakable in any reasonable amount of time; i.e., intercepted instructions and/or data would be meaningless to the unauthorized interceptor. The second intended effect is that, as a result of embedded computing platforms realized using inscrutable computing elements, any loss of equipment utilizing such computing hardware would not be very meaningful to the recoverer: we refer to this as loss tolerance.
Cache-Related Timing Analysis for Multi-tasking Real-Time Systems with Nested Preemptions

(Georgia Institute of Technology, 2004) Tan, Yudong ; Mooney, Vincent John, III

In this paper, we propose an approach to estimate the Worst Case Response Time (WCRT) of each task in a preemptive multi-tasking single-processor real-time system utilizing an L1 cache. The approach combines inter-task cache eviction analysis and intra-task cache access analysis to estimate the number of cache lines that can possibly be evicted by the preempting task and also be accessed again by the preempted task after preemptions (thus requiring the preempted task to reload the cache line(s)). This cache reload delay caused by preempting task(s) is then incorporated into WCRT analysis. Two sets of applications are used to test our approach. Each set of applications contains three tasks. The experimental results show that our approach can tighten the WCRT estimate by up to 32% (1.4X) over prior state-of-the-art.
Some Layouts Using the Sleepy Stack Approach

(Georgia Institute of Technology, 2004) Pfeiffenberger, Philipp ; Park, Jun Cheol ; Mooney, Vincent John, III

This technical report elaborates on the methodology and findings presented in “Sleepy Stack Reduction of Leakage Power” by J.C. Park, V. J. Mooney III and P. Pfeiffenberger [1]. The scope of this report includes test procedures and data on delay, dynamic and static power for all considered approaches and implementations as well as schematics and layouts for all considered approaches and implementations.
An O(NIN(M,N)) Parallel Deadlock Detection Algorithm

(Georgia Institute of Technology, 2003) Lee, Jaehwan ; Mooney, Vincent John, III

This paper presents a novel Parallel Deadlock Detection Algorithm (PDDA) and its hardware implementation, Deadlock Detection Unit (DDU). PDDA uses simple boolean representations of request, grant and no activity so that the hardware implementation of PDDA becomes easier and operates faster. We prove that the DDU has a worst-case run-time of 2 x min(m, n) - O(min(m,n)), where m is the number of resources and n is the number of processes. Previous algorithms in software, by contrast, have O(m x n) run-time complexity. We also prove the correctness of PDDA and the DDU. The DDU reduces deadlock detection time by 99.9%, (i.e., 1000X) or more compared to software implementations of deadlock detection algorithms. An experiment involving a practical situation that employs the DDU showed that the time measured from application initialization to deadlock detection was reduced by 46% compared to detecting deadlock in software.
Golay and Wavelet Error Control Codes in VLSI

(Georgia Institute of Technology, 2003) Balasundaram, Arunkumar ; Pereira, Angelo W. D. ; Park, Jun Cheol ; Mooney, Vincent John, III

This technical report describes AGNI (meaning Fire in Sanskrit) – a VLSI chip to implement error control codes. The chip was initially conceived and designed as part of a Georgia Tech Cutting Edge Research Grant. However, this chip implementation of error control codes has been undertaken as a part of the ECE 6130 course taught in Spring 2002 by Dr. John Uyemura, Professor, Department of Electrical and Computer Engineering, Georgia Institute of Technology. Two coders have been implemented: a (12, 6, 4) wavelet encoder/decoder and a (24, 12, 8) golay encoder/decoder, where the (N, M, d) nomenclature stands for (N=encoded length, M=message length, d=hamming distance). These codes have a correctable limit of one bit error and three bit errors, respectively. The following section presents the encoding/decoding functionality of the chip in more detail. This project could potentially feed a future project to incorporate the chip into a System-ona- Package (SoP). It is expected that the chip would function as a high-speed error encoder/decoder for Radio Frequency (RF) applications.
Atalanta: A New Multiprocessor RTOS Kernel for System-on-a-Chip Applications

(Georgia Institute of Technology, 2002) Sun, Di-Shi ; Blough, Douglas M. ; Mooney, Vincent John, III

This paper introduces a new multiprocessor real-time operating system (RTOS) kernel that is designed as a software platform for System-On-Chip (SoC) applications and hardware/software codesign research purposes. This multiprocessor RTOS kernel has the key features of an RTOS, such as multitasking capabilities, event-driven priority-based preemptive scheduling; and interprocess communication and synchronization. Atalanta has some features important for SoC applications, such as a small, compact, deterministic, modular and library-based architecture. Atalanta also supports some special features such as priority inheritance, and user configurability. Atalanta supports multiple processors with synchronization based on atomic read-modify-write operations.
Round-robin Arbiter Design and Generation

(Georgia Institute of Technology, 2002) Shin, Eung Seo ; Mooney, Vincent John, III ; Riley, George F.

In this paper, we introduce a Roundrobin Arbiter Generator (RAG) tool. The RAG tool can generate a design for a Bus Arbiter (BA). The BA is able to handle the exact number of bus masters for both on-chip and off-chip buses specified by the user of RAG. RAG can also generate a distributed and parallel hierarchical Switch Arbiter (SA). The first contribution of this paper is the automated generation of a round-robin token passing BA to reduce time spent on arbiter design. The generated arbiter is fair, fast, and has a low and predictable worst-case wait time. The second contribution of this paper is the design and integration of a distributed fast arbiter, e.g., for a terabit switch, based on 4x4 and 2x2 switch arbiters (SAs). Using a .25 TSMC standard cell library from LEDA Systems [11, 15], we show the arbitration time of a 256x256 SA for a terabit switch and demonstrate that the SA generated by RAG meets the time constraint to achieve approximately six terabits of throughput in a typical network switch design. Furthermore, our generated SA performs better than the Ping-Pong Arbiter and Programmable Priority Encoder by a factor of 1.9X and 2.4X, respectively.
Automated Bus Generation for Multiprocessor SoC Design

(Georgia Institute of Technology, 2002) Ryu, Kyeong Keol ; Mooney, Vincent John, III

The performance of a system, especially a multiprocessor system, heavily depends upon the efficiency of its bus architecture. This paper presents a methodology to generate a custom bus system for a multiprocessor System-on-a-Chip (SoC). Our bus synthesis tool (BusSyn) uses this methodology to generate five different bus systems as examples: Bi-FIFO Bus Architecture (BFBA), Global Bus Architecture Version I (GBAVI), Global Bus Architecture Version III (GBAVIII), Hybrid bus architecture (Hybrid) and Split Bus Architecture (SplitBA). We verify and evaluate the performance of each bus system in the context of three applications: an Orthogonal Frequency Division Multiplexing (OFDM) wireless transmitter, an MPEG2 decoder and a database example. This methodology gives the designer a great benefit in fast design space exploration of bus architectures across a variety of performance impacting factors such as bus types, processor types and software programming style. In this paper, we show that BusSyn can generate buses that achieve superior performance when compared to a simple General Global Bus Architecture (GGBA) (e.g., 41% reduction in execution time in the case of a database example). In addition, the bus architecture generated by BusSyn is designed in a matter of seconds instead of weeks for the hand design of a custom bus system.
Instruction-level Reverse Execution for Debugging

(Georgia Institute of Technology, 2002) Akgul, Tankut ; Mooney, Vincent John, III

Reverse execution provides a programmer with the ability to return a program to a previous state in its execution history. The ability to execute a program in reverse is advantageous for shortening software development time. Conventional techniques for reverse execution rely on saving a state into a record before the state is destroyed. State saving introduces both memory and time overheads during forward execution. Our proposed method introduces a reverse execution methodology at the assembly instruction level with low memory and time overheads. The methodology generates from a program a reverse program by which a destroyed state is almost always regenerated rather than being restored from a record. This significantly reduces state saving. The methodology has been implemented on a PowerPC processor with a custom made debugger. As compared to previous work using state saving techniques, the experimental results show 2.5X to 400X memory overhead reduction for the tested benchmarks. Furthermore, the results with the same benchmarks show an average of 4.1X to 5.7X reduction in execution time overhead.