Person:

Harrold, Mary Jean

Permanent Link

https://hdl.handle.net/1853/71329

Associated Organization(s)

Organizational Unit

School of Computer Science

Full item page

Publication Search Results

Now showing 1 - 10 of 17

SPA: Symbolic Program Approximation for Scalable Path-sensitive Analysis

(Georgia Institute of Technology, 2009) Harrold, Mary Jean ; Santelices, Raul

Symbolic execution is a static-analysis technique that has been used for applications such as test-input generation and change analysis. Symbolic execution’s path sensitivity makes scaling it difficult. Despite recent advances that reduce the number of paths to explore, the scalability problem remains. Moreover, there are applications that require the analysis of all paths in a program fragment, which exacerbate the scalability problem. In this paper, we present a new technique, called Symbolic Program Approximation (SPA), that performs an approximation of the symbolic execution of all paths between two program points by abstracting away certain symbolic subterms to make the symbolic analysis practical, at the cost of some precision. We discuss several applications of SPA, including testing of software changes and static invariant discovery. We also present a tool that implements SPA and an empirical evaluation on change analysis and testing that shows the applicability, effectiveness, and potential of our technique.
Visualization of Exception Handling Constructs to Support Program Understanding

(Georgia Institute of Technology, 2009) Shah, Hina ; Görg, Carsten ; Harrold, Mary Jean

This paper presents a new visualization technique for supporting the understanding of exception-handling constructs in Java programs. To understand the requirements for such a visualization, we surveyed a group of software developers, and used the results of that survey to guide the creation of the visualizations. The technique presents the exception-handling information using three views: the quantitative view, the flow view, and the contextual view. The quantitative view provides a high-level view that shows the throw-catch interactions in the program, along with relative numbers of these interactions, at the package level, the class level, and the method level. The flow view shows the type-throw-catch interactions, illustrating information such as which exception types reach particular throw statements, which catch statements handle particular throw statements, and which throw statements are not caught in the program. The contextual view shows, for particular type-throw-catch interactions, the packages, classes, and methods that contribute to that exception-handling construct. We implemented our technique in an Eclipse plugin called EnHanCe and conducted a usability and utility study with participants in industry.
HDCCSR: software self-awareness using dynamic analysis and Markov models

(Georgia Institute of Technology, 2008-12-20) Harrold, Mary Jean ; Rugaber, Spencer ; Rehg, James M.
Improving the Classification of Software Behaviors using Ensembles

(Georgia Institute of Technology, 2005) Bowring, James Frederick ; Harrold, Mary Jean ; Rehg, James M.

One approach to the automatic classification of program behaviors is to view these behaviors as the collection of all the program's executions. Many features of these executions, such as branch profiles, can be measured, and if these features accurately predict behavior, we can build automatic behavior classifiers from them using statistical machine-learning techniques. Two key problems in the development of useful classifiers are (1) the costs of collecting and modeling data and (2) the adaptation of classifiers to new or unknown behaviors. We address the first problem by concentrating on the properties and costs of individual features and the second problem by using the active-learning paradigm. In this paper, we present our technique for modeling a data-flow feature as a stochastic process exhibiting the Markov property. We introduce the novel concept of databins to summarize, as Markov models, the transitions of values for selected variables. We show by empirical studies that databin-based classifiers are effective. We also describe ensembles of classifiers and how they can leverage their components to improve classification rates. We show by empirical studies that ensembles of control-flow and data-flow based classifiers can be more effective than either component classifier.
Understanding Data Dependences in the Presence of Pointers

(Georgia Institute of Technology, 2003) Orso, Alessandro ; Sinha, Saurabh ; Harrold, Mary Jean

Understanding data dependences in programs is important for many software-engineering activities, such as program understanding, impact analysis, reverse engineering, and debugging. The presence of pointers, arrays, and structures can cause subtle and complex data dependences that can be difficult to understand. For example, in languages such as C, an assignment made through a pointer dereference can assign a value to one of several variables, none of which may appear syntactically in that statement. In the first part of this paper, we describe two techniques for classifying data dependences in the presence of pointer dereferences. The first technique classifies data dependences based on definition type, use type, and path type. The second technique classifies data dependences based on span. We present empirical results to illustrate the distribution of data-dependence types and spans for a set of real C programs. In the second part of the paper, we discuss two applications of the classification techniques. First, we investigate different ways in which the classification can be used to facilitate data-flow testing and verification. We outline an approach that uses types and spans of data dependences to determine the appropriate verification technique for different data dependences; we present empirical results to illustrate the approach. Second, we present a new slicing paradigm that computes slices based on types of data dependences. Based on the new paradigm, we define an incremental slicing technique that computes a slice in multiple steps. We present empirical results to illustrate the sizes of incremental slices and the potential usefulness of incremental slicing for debugging.
Software Behavior: Automatic Classification and its Applications

(Georgia Institute of Technology, 2003) Bowring, James Frederick ; Rehg, James M. ; Harrold, Mary Jean

A program's behavior is ultimately the collection of all its executions. This collection is diverse, unpredictable, and generally unbounded. Thus it is especially suited to statistical analysis and machine learning techniques. We explore the thesis that 1st- and 2nd-order Markov models of event-transitions are effective predictors of program behavior. We present a technique that models program executions as Markov models, and a clustering method for Markov models that aggregates multiple program executions, yielding a statistical description of program behaviors. With this approach, we can train classifiers to recognize specific behaviors emitted by an execution without knowledge of inputs or outcomes. We evaluate an application of active learning to the efficient refinement of our classifiers by conducting three empirical studies that explore a scenario illustrating automated test plan augmentation. We present a set of potential research questions and applications that our work suggests.
A Framework for Understanding Data Dependences

(Georgia Institute of Technology, 2002) Orso, Alessandro ; Liang, Donglin ; Sinha, Saurabh ; Harrold, Mary Jean

Identifying and understanding data dependences is important for a variety of software-engineering tasks. The presence of pointers, arrays, and dynamic memory allocation introduces subtle and complex data dependences that may be difficult to understand. In this paper, we present a refinement of our previously developed classification that also distinguishes the types of memory locations, considers interprocedural data dependences, and further distinguishes such data dependences based on the kinds of interprocedura paths on which they occur. This new classification enables reasoning about the complexity of data dependences in programs using features such as pointers, arrays, and dynamic memory allocation. We present an algorithm for computing interprocedural data dependences according to our classification. To evaluate the classification, we compute the distribution of data dependences for a set of real C programs and we discuss how the distribution can be useful in understanding the characteristics of a program. We also evaluate how alias information provided by different algorithms, varying in precision, affects the distribution. Finally, we investigate how the classification can be exploited to estimate complexity of the data dependences in a program.
Gamma System: Continuous Evolution of Software after Deployment

(Georgia Institute of Technology, 2002) Orso, Alessandro ; Liang, Donglin ; Harrold, Mary Jean ; Lipton, Richard J.

In this paper, we present the Gamma system---a new approach for continuous improvement of software systems after their deployment. The Gamma system facilitates remote monitoring of deployed software using a revolutionary approach that exploits the opportunities presented by a software product being used by many users connected through a network. Gamma splits monitoring tasks across different instances of the software, so that partial information can be collected from different users by means of light-weight instrumentation, and integrated to gather the overall monitoring information. This system enables software producers (1) to perform continuous, minimally intrusive analyses of their software's behavior, and (2) to use the information thus gathered to improve and evolve their software. We describe the Gamma system and its underlying technology in detail, and illustrate the different components of the system. We also present a prototype implementation of the system and show our initial experiences with it.
A Technique for Dynamic Updating of Java Software

(Georgia Institute of Technology, 2002) Orso, Alessandro ; Rao, Anup ; Harrold, Mary Jean

TDuring maintenance, systems are updated to correct faults, improve functionality, and adapt the software to changes in its execution environment. The typical software-update process consists of stopping the system to be updated, performing the actual update of the code, and restarting the system. For systems such as banking and telecommunication software, however, the cost of downtime can be prohibitive. The situation is even worse for systems such as air-traffic controllers and life-support software, for which a shut-down is in general not an option. In those cases, the use of some form of on-the-fly program modification is required. In this paper, we propose a new technique for dynamic updating of Java software. Our technique is based on the use of proxy classes and does not require any support from the runtime system. The technique allows for updating a running Java program by substituting, adding, and deleting classes. We also present Dusc (Dynamic Updating through Swapping of Classes), a tool that we developed and that implements our technique. Finally, we describe an empirical study that we performed to validate the technique on a real Java subject. The results of the study show that our technique can be effectively applied to Java software with only little overhead in both execution time and program size.
Paper: Evaluating the Precision of Static Reference Analysis Using Profiling

(Georgia Institute of Technology, 2002) Liang, Donglin ; Pennings, Maikel ; Harrold, Mary Jean

Program analyses and optimization of Java programs require reference information that determines the instances that may be accessed through dereferences. Reference information can be computed using reference analysis. This paper presents a set of studies that evaluate the precision of some existing approaches for identifying instances and for computing reference information in a reference analysis. The studies use dynamic reference information collected during run-time as a lower bound approximation to the precise reference information. The studies measure the precision of an existing approach by comparing the information computed using the approach with the lower bound approximation. The paper also presents case studies that attempt to identify the cases under which an existing approach is not effective. The presented studies provide information that may guide the usage of existing reference analysis techniques and the development of new reference analysis techniques.