Hardware assisted memory checkpointing and applications
in debugging and reliability

Doudalis, Ioannis

Title:

Hardware assisted memory checkpointing and applications in debugging and reliability

Files

doudalis_ioannis_201112_phd.pdf (3.21 MB)

Author(s)

Doudalis, Ioannis

Advisor(s)

Prvulovic, Milos

Advisor(s)

Person

Prvulovic, Milos

Associated Organization(s)

Organizational Unit

College of Computing

Organizational Unit

School of Computer Science

Collections

Theses and Dissertations

Permanent Link

http://hdl.handle.net/1853/42700

Abstract

The problems of software debugging and system reliability/availability are among the most challenging problems the computing industry is facing today, with direct impact on the development and operating costs of computing systems. A promising debugging technique that assists programmers identify and fix the causes of software bugs a lot more efficiently is bidirectional debugging, which enables the user to execute the program in "reverse", and a typical method used to recover a system after a fault is backwards error recovery, which restores the system to the last error-free state. Both reverse execution and backwards error recovery are enabled by creating memory checkpoints, which are used to restore the program/system to a prior point in time and re-execute until the point of interest. The checkpointing frequency is the primary factor that affects both the latency of reverse execution and the recovery time of the system; more frequent checkpoints reduce the necessary re-execution time. Frequent creation of checkpoints poses performance challenges, because of the increased number of memory reads and writes necessary for copying the modified system/program memory, and also because of software interventions, additional synchronization and I/O, etc., needed for creating a checkpoint. In this thesis I examine a number of different hardware accelerators, whose role is to create frequent memory checkpoints in the background, at minimal performance overheads. For the purpose of reverse execution, I propose the HARE and Euripus hardware checkpoint accelerators. HARE and Euripus create different types of checkpoints, and employ different methods for keeping track of the modified memory. As a result, HARE and Euripus have different hardware costs and provide different functionality which directly affects the latency of reverse execution. For improving the availability of the system, I propose the Kyma hardware accelerator. Kyma enables simultaneous creation of checkpoints at different frequencies, which allows the system to recover from multiple types of errors and tolerate variable error-detection latencies. The Kyma and Euripus hardware engines have similar architectures, but the functionality of the Kyma engine is optimized for further reducing the performance overheads and improving the reliability of the system. The functionality of the Kyma and Euripus engines can be combined into a unified accelerator that can serve the needs of both bidirectional debugging and system recovery.

Date Issued

2011-07-25

Resource Type

Text

Resource Subtype

Dissertation

Full item page

Title:

Hardware assisted memory checkpointing and applications in debugging and reliability

Files

Author(s)

Authors

Advisor(s)

Advisor(s)

Editor(s)

Associated Organization(s)

Series

Collections

Supplementary to

Permanent Link

Abstract

Sponsor

Date Issued

Extent

Resource Type

Resource Subtype

Rights Statement

Rights URI

Georgia Tech Library

Title: Hardware assisted memory checkpointing and applications in debugging and reliability

Files

Author(s)

Authors

Advisor(s)

Advisor(s)

Editor(s)

Associated Organization(s)

Series

Collections

Supplementary to

Permanent Link

Abstract

Sponsor

Date Issued

Extent

Resource Type

Resource Subtype

Rights Statement

Rights URI

Title:

Hardware assisted memory checkpointing and applications in debugging and reliability