Harrold, Mary Jean

Associated Organizations
Organizational Unit
Finding Aid

Publication Search Results

Now showing 1 - 10 of 21
  • Item
    Using Component Metadata to Support the Regression Testing of Component-Based Software
    (Georgia Institute of Technology, 2000) Harrold, Mary Jean ; Orso, Alessandro ; Rosenblum, David S. ; Rothermel, Gregg ; Soffa, Mary Lou ; Do, Hyunsook ; College of Computing
    Interest in component-based software continues to grow with the recognition of its potential in managing the increasing complexity of software systems. However, the use of externally-provided components has serious drawbacks, in most cases due to the lack of information about the components, for a wide range of activities in the engineering of component-based applications. Consider the activity of regression testing, whose high cost has been, and continues to be, a problem. In the case of component-based applications, regression testing can be even more expensive. When a new version of one or more components is integrated into an application, the lack of information about such externally-developed components makes it difficult to effectively determine the test cases that should be rerun on the resulting application. In previous work, we proposed the use of metadata, which are additional data provided with a component, to support software engineering tasks. In this paper, we present two new metadata-based techniques that address the problem of regression test selection for component-based applications: a code-based approach and a specification-based approach. First, using an example, we illustrate the two techniques. Then, we present a case study that applies the code-based technique to a real component-based system. The results of the study indicate that, on average, 26% of the overall testing effort can be saved over seven releases of the component-based system studied, with a maximum savings of 99% of the testing effort for one version. This reduction demonstrates that metadata can produce benefits in regression testing by reducing the costs related to this activity.
  • Item
    SPA: Symbolic Program Approximation for Scalable Path-sensitive Analysis
    (Georgia Institute of Technology, 2009) Harrold, Mary Jean ; Santelices, Raul ; Center for Experimental Research in Computer Systems ; Georgia Institute of Technology. Center for Experimental Research in Computer Systems
    Symbolic execution is a static-analysis technique that has been used for applications such as test-input generation and change analysis. Symbolic execution’s path sensitivity makes scaling it difficult. Despite recent advances that reduce the number of paths to explore, the scalability problem remains. Moreover, there are applications that require the analysis of all paths in a program fragment, which exacerbate the scalability problem. In this paper, we present a new technique, called Symbolic Program Approximation (SPA), that performs an approximation of the symbolic execution of all paths between two program points by abstracting away certain symbolic subterms to make the symbolic analysis practical, at the cost of some precision. We discuss several applications of SPA, including testing of software changes and static invariant discovery. We also present a tool that implements SPA and an empirical evaluation on change analysis and testing that shows the applicability, effectiveness, and potential of our technique.
  • Item
    Understanding Data Dependences in the Presence of Pointers
    (Georgia Institute of Technology, 2003) Orso, Alessandro ; Sinha, Saurabh ; Harrold, Mary Jean ; Center for Experimental Research in Computer Systems
    Understanding data dependences in programs is important for many software-engineering activities, such as program understanding, impact analysis, reverse engineering, and debugging. The presence of pointers, arrays, and structures can cause subtle and complex data dependences that can be difficult to understand. For example, in languages such as C, an assignment made through a pointer dereference can assign a value to one of several variables, none of which may appear syntactically in that statement. In the first part of this paper, we describe two techniques for classifying data dependences in the presence of pointer dereferences. The first technique classifies data dependences based on definition type, use type, and path type. The second technique classifies data dependences based on span. We present empirical results to illustrate the distribution of data-dependence types and spans for a set of real C programs. In the second part of the paper, we discuss two applications of the classification techniques. First, we investigate different ways in which the classification can be used to facilitate data-flow testing and verification. We outline an approach that uses types and spans of data dependences to determine the appropriate verification technique for different data dependences; we present empirical results to illustrate the approach. Second, we present a new slicing paradigm that computes slices based on types of data dependences. Based on the new paradigm, we define an incremental slicing technique that computes a slice in multiple steps. We present empirical results to illustrate the sizes of incremental slices and the potential usefulness of incremental slicing for debugging.
  • Item
    Gamma System: Continuous Evolution of Software after Deployment
    (Georgia Institute of Technology, 2002) Orso, Alessandro ; Liang, Donglin ; Harrold, Mary Jean ; Lipton, Richard J. ; College of Computing
    In this paper, we present the Gamma system---a new approach for continuous improvement of software systems after their deployment. The Gamma system facilitates remote monitoring of deployed software using a revolutionary approach that exploits the opportunities presented by a software product being used by many users connected through a network. Gamma splits monitoring tasks across different instances of the software, so that partial information can be collected from different users by means of light-weight instrumentation, and integrated to gather the overall monitoring information. This system enables software producers (1) to perform continuous, minimally intrusive analyses of their software's behavior, and (2) to use the information thus gathered to improve and evolve their software. We describe the Gamma system and its underlying technology in detail, and illustrate the different components of the system. We also present a prototype implementation of the system and show our initial experiences with it.
  • Item
    Incremental Slicing Based on Data-Dependences Types
    (Georgia Institute of Technology, 2000) Orso, Alessandro ; Sinha, Saurabh ; Harrold, Mary Jean ; College of Computing
    Program slicing is useful for assisting with software-maintenance tasks, such as program understanding, debugging, impact analysis, and regression testing. The presence and frequent usage of pointers, in languages such as C, causes complex data dependences. To function effectively on such programs, slicing techniques must account for pointerinduced data dependences. Although many existing slicing techniques function in the presence of pointers, none of those techniques distinguishes data dependences based on their types. This paper presents a new slicing technique, in which slices are computed based on types of data dependences. This new slicing technique offers several benefits and can be exploited in different ways, such as identifying subtle data dependences for debugging purposes, computing reduced-size slices quickly for complex programs, and performing incremental slicing. In particular, this paper describes an algorithm for incremental slicing that increases the scope of a slice in steps, by incorporating different types of data dependences at each step. The paper also presents empirical results to illustrate the performance of the technique in practice. The experimental results show how the sizes of the slices grow for different small- and mediumsized subjects. Finally, the paper presents a case study that explores a possible application of the slicing technique for debugging.
  • Item
    Improving the Classification of Software Behaviors using Ensembles
    (Georgia Institute of Technology, 2005) Bowring, James Frederick ; Harrold, Mary Jean ; Rehg, James M. ; Center for Experimental Research in Computer Systems
    One approach to the automatic classification of program behaviors is to view these behaviors as the collection of all the program's executions. Many features of these executions, such as branch profiles, can be measured, and if these features accurately predict behavior, we can build automatic behavior classifiers from them using statistical machine-learning techniques. Two key problems in the development of useful classifiers are (1) the costs of collecting and modeling data and (2) the adaptation of classifiers to new or unknown behaviors. We address the first problem by concentrating on the properties and costs of individual features and the second problem by using the active-learning paradigm. In this paper, we present our technique for modeling a data-flow feature as a stochastic process exhibiting the Markov property. We introduce the novel concept of databins to summarize, as Markov models, the transitions of values for selected variables. We show by empirical studies that databin-based classifiers are effective. We also describe ensembles of classifiers and how they can leverage their components to improve classification rates. We show by empirical studies that ensembles of control-flow and data-flow based classifiers can be more effective than either component classifier.
  • Item
    Visually Encoding Program Test Information to Find Faults in Software
    (Georgia Institute of Technology, 2001) Eagan, James Robinson, Jr. ; Harrold, Mary Jean ; Jones, James Arthur ; Stasko, John T. ; GVU Center
    Large test suites are frequently used to evaluate the correctness of software systems and to locate errors. Unfortunately, this process can generate a huge amount of data that is difficult to interpret manually. We have created a system called Tarantula that visually encodes test data to help find program errors. The system uses a principled color mapping to represent how particular source lines act in passed and failed tests. It also provides a flexible user interface for examining different perspectives that show the effects on source regions of test suites ranging from individual tests, to important subsets such as the set of failed tests, to the entire test suite.
  • Item
    A Technique for Dynamic Updating of Java Software
    (Georgia Institute of Technology, 2002) Orso, Alessandro ; Rao, Anup ; Harrold, Mary Jean ; College of Computing
    TDuring maintenance, systems are updated to correct faults, improve functionality, and adapt the software to changes in its execution environment. The typical software-update process consists of stopping the system to be updated, performing the actual update of the code, and restarting the system. For systems such as banking and telecommunication software, however, the cost of downtime can be prohibitive. The situation is even worse for systems such as air-traffic controllers and life-support software, for which a shut-down is in general not an option. In those cases, the use of some form of on-the-fly program modification is required. In this paper, we propose a new technique for dynamic updating of Java software. Our technique is based on the use of proxy classes and does not require any support from the runtime system. The technique allows for updating a running Java program by substituting, adding, and deleting classes. We also present Dusc (Dynamic Updating through Swapping of Classes), a tool that we developed and that implements our technique. Finally, we describe an empirical study that we performed to validate the technique on a real Java subject. The results of the study show that our technique can be effectively applied to Java software with only little overhead in both execution time and program size.
  • Item
    Probabilistic Slicing for Predictive Impact Analysis
    (Georgia Institute of Technology, 2010) Santelices, Raul ; Harrold, Mary Jean ; Center for Experimental Research in Computer Systems ; Georgia Institute of Technology. Center for Experimental Research in Computer Systems
    Program slicing is a technique that determines which statements in a program affect or are affected by another statement in that program. Static forward slicing, in particular, can be used for impact analysis by identifying all potential effects of changes in software. This information helps developers design and test their changes. Unfortunately, static slicing is too imprecise—it often produces large sets of potentially affected statements, limiting its usefulness. To reduce the resulting set of statements, other forms of slicing have been proposed, such as dynamic slicing and thin slicing, but they can miss relevant statements. In this paper, we present a new technique, called Probabilistic Slicing (p-slicing), that augments a static forward slice with a relevance score for each statement by exploiting the observation that not all statements have the same probability of being affected by a change. P-slicing can be used, for example, to focus the attention of developers on the “most impacted” parts of the program first. It can also help testers, for example, by estimating the difficulty of “killing” a particular mutant in mutation testing and prioritizing test cases. We also present an empirical study that shows the effectiveness of p-slicing for predictive impact analysis and we discuss potential benefits for other tasks.
  • Item
    A Framework for Understanding Data Dependences
    (Georgia Institute of Technology, 2002) Orso, Alessandro ; Liang, Donglin ; Sinha, Saurabh ; Harrold, Mary Jean ; College of Computing
    Identifying and understanding data dependences is important for a variety of software-engineering tasks. The presence of pointers, arrays, and dynamic memory allocation introduces subtle and complex data dependences that may be difficult to understand. In this paper, we present a refinement of our previously developed classification that also distinguishes the types of memory locations, considers interprocedural data dependences, and further distinguishes such data dependences based on the kinds of interprocedura paths on which they occur. This new classification enables reasoning about the complexity of data dependences in programs using features such as pointers, arrays, and dynamic memory allocation. We present an algorithm for computing interprocedural data dependences according to our classification. To evaluate the classification, we compute the distribution of data dependences for a set of real C programs and we discuss how the distribution can be useful in understanding the characteristics of a program. We also evaluate how alias information provided by different algorithms, varying in precision, affects the distribution. Finally, we investigate how the classification can be exploited to estimate complexity of the data dependences in a program.