Application of Machine Learning Techniques to Parameter Selection for Flight Risk Identification

Thumbnail Image
Mangortey, Eugene
Monteiro, Dylan J.
Ackley, Jamey
Gao, Zhenyu
Puranik, Tejas G.
Kirby, Michelle
Pinon, Olivia J.
Mavris, Dimitri N.
Associated Organizations
Supplementary to
In recent years, the use of data mining and machine learning techniques for safety analysis, incident and accident investigation, and fault detection has gained traction among the aviation community. Flight data collected from recording devices contains a large number of heterogeneous parameters, sometimes reaching up to thousands on modern commercial aircraft. More data is being collected continuously which adds to the ever-increasing pool of data available for safety analysis. However, among the data collected, not all parameters are important from a risk and safety analysis perspective. Similarly, in order to be useful for modern analysis techniques such as machine learning, using thousands of parameters collected at a high frequency might not be computationally tractable. As such, an intelligent and repeatable methodology to select a reduced set of significant parameters is required to allow safety analysts to focus on the right parameters for risk identification. In this paper, a step-by-step methodology is proposed to down-select a reduced set of parameters that can be used for safety analysis. First, correlation analysis is conducted to remove highly correlated, duplicate, or redundant parameters from the data set. Second, a pre-processing step removes metadata and empty parameters. This step also considers requirements imposed by regulatory bodies such as the Federal Aviation Administration and subject matter experts to further trim the list of parameters. Third, a clustering algorithm is used to group similar flights and identify abnormal operations and anomalies. A retrospective analysis is conducted on the clusters to identify their characteristics and impact on flight safety. Finally, analysis of variance techniques are used to identify which parameters were significant in the formation of the clusters. Visualization dashboards were created to analyze the cluster characteristics and parameter significance. This methodology is employed on data from the approach phase of a representative single-aisle aircraft to demonstrate its application and robustness across heterogeneous data sets. It is envisioned that this methodology can be further extended to other phases of flight and aircraft.
Date Issued
Resource Type
Resource Subtype
Rights Statement
Rights URI