On The Use of Over-Approximate Analysis in Support of Software Development and Testing
Author(s)
Rutledge, Richard
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
The effectiveness of dynamic program analyses, such as profiling and memory-leak detection, crucially depend on the quality of the test inputs. However, adequate sets of inputs are rarely available. Existing automated input generation techniques can help but tend to be either too expensive or ineffective. For example, traditional symbolic execution scales poorly to real-world programs and random input generation may never reach deep states within the program.
For scalable, effective, automated input generation that can better support dynamic analysis, I propose an approach that extends traditional symbolic execution by targeting increasingly small fragments of a program. The approach starts by generating inputs for the whole program and progressively introduces additional unconstrained state until it reaches a given program coverage objective. This approach is applicable to any client dynamic analysis requiring high coverage that is also tolerant of over-approximated program behavior--behavior that cannot occur on a complete execution.
To assess the effectiveness of my approach, I applied it to two client techniques. The first technique infers the actual path taken by a program execution by observing the CPU's electromagnetic emanations and requires inputs to generate a model that can recognize executed path segments.
The client inference works by piece wise matching the observed emanation waveform to those recorded in a model.
It requires the model to be complete (i.e. contain every piece) and the waveforms are sufficiently distinct that the inclusion of extra samples is unlikely to cause a misinference.
After applying my approach to generate inputs covering all subsegments of the program’s execution paths, I designed a source generator to automatically construct a harness and scaffolding to replay these inputs against fragments of the original program.
The inference client constructs the model by recording the harness execution.
The second technique performs automated regression testing by identifying behavioral differences between two program versions and requires inputs to perform differential testing.
It explores local behavior in a neighborhood of the program changes by generating inputs to functions near (as measured by call-graph) to the modified code.
The inputs are then concretely executed on both versions, periodically checking internal state for behavioral differences.
The technique requires high coverage inputs for a full examination, and tolerates infeasible local state since both versions likely execute it equivalently.
I will then present a separate technique to improve the coverage obtained by symbolic execution of floating-point programs.
This technique is equally applicable to both traditional symbolic execution and my progressively under-constrained symbolic execution.
Its key idea is to approximate floating-point expressions with fixed-point analogs.
In concluding, I will also discuss future research directions, including additional empirical evaluations and the investigation of additional client analyses that could benefit from my approach.
Sponsor
Date
2022-12-01
Extent
Resource Type
Text
Resource Subtype
Dissertation