Series
Master of Science in Computer Science

Series Type
Degree Series
Description
Associated Organization(s)
Associated Organization(s)
Organizational Unit

Publication Search Results

Now showing 1 - 2 of 2
  • Item
    Performance understanding and tuning of iterative computation using profiling techniques
    (Georgia Institute of Technology, 2010-05-18) Ozarde, Sarang Anil
    Most applications spend a significant amount of time in the iterative parts of a computation. They typically iterate over the same set of operations with different values. These values either depend on inputs or values calculated in previous iterations. While loops capture some iterative behavior, in many cases such a behavior is spread over whole program sometimes through recursion. Understanding iterative behavior of the computation can be very useful to fine-tune it. In this thesis, we present a profiling based framework to understand and improve performance of iterative computation. We capture the state of iterations in two aspects 1) Algorithmic State 2) Program State. We demonstrate the applicability of our framework for capturing algorithmic state by applying it to the SAT Solvers and program state by applying it to a variety of benchmarks exhibiting completely parallelizable loops. Further, we show that such a performance characterization can be successfully used to improve the performance of the underlying application. Many high performance combinatorial optimization applications involve SAT solving. A variety of SAT solvers have been developed that employ different data structures and different propagation methods for converging on a fixed point for generating a satisfiable solution. The performance debugging and tuning of SAT solvers to a given domain is an important problem encountered in practice. Unfortunately not much work has been done to quantify the iterative efficiency of SAT solvers. In this work, we develop quantifiable measures for calculating convergence efficiency of SAT solvers. Here, we capture the Algorithmic state of the application by tracking the assignment of variables for each iteration. A compact representation of profile data is developed to track the rate of progress and convergence. The novelty of this approach is that it is independent of the specific strategies used in individual solvers, yet it gives key insights into the "progress" and "convergence behavior" of the solver in terms of a specific implementation at hand. An analysis tool is written to interpret the profile data and extract values of the following metrics such as: average convergence rate, efficiency of iteration and variable stabilization. Finally, using this system we produce a study of 4 well known SAT solvers to compare their iterative efficiency using random as well as industrial benchmarks. Using the framework, iterative inefficiencies that lead to slow convergence are identified. We also show how to fine-tune the solvers by adapting the key steps. We also show that the similar profile data representation can be easily applied to loops, in general, to capture their program state. One of the key attributes of the program state inside loops is their branch behavior. We demonstrate the applicability of the framework by profiling completely parallelizable loops (no cross-iteration dependence) and by storing the branching behavior of each iteration. The branch behavior across a group of iterations is important in devising the thread warps from parallel loops for efficient execution on GPUs. We show how some loops can be effectively parallelized on GPUs using this information.
  • Item
    An Optimization Framework for Embedded Processors with Auto-Modify Addressing Modes
    (Georgia Institute of Technology, 2004-12-08) Lau, ChokSheak
    Modern embedded processors with dedicated address generation unit support memory accesses using indirect addressing mode with auto-increment and auto-decrement. The auto-increment/decrement mode, if properly utilized, can save address arithmetic instructions, reduce static and dynamic footprint of the program and speed up the execution as well. We propose an optimization framework for embedded processors based on the auto-increment and decrement addressing modes for address registers. Existing work on this class of optimizations focuses on using an access graph and finding the maximum weight path cover to find an optimized stack variables layout. We take this further by using coalescing, addressing mode selection and offset registers to find further opportunities for reducing the number of load-address instructions required. We also propose an algorithm for building the layout with considerations for memory accesses across basic blocks, because existing work mainly considers intra-basic-block information. We then use the available offset registers to try to further reduce the number of address arithmetic instructions after layout assignment.