Aviation Safety QA Dataset for Extracting Knowledge From Incident Reports

Author(s)
Oderinde, Timilehin P.
Chandra, Chetan
Albertoli, Leslie
Bhanpato, Jirat
Bendarkar, Mayank V.
Editor(s)
Associated Organization(s)
Organizational Unit
Daniel Guggenheim School of Aerospace Engineering
The Daniel Guggenheim School of Aeronautics was established in 1931, with a name change in 1962 to the School of Aerospace Engineering
Series
Collections
Supplementary to:
Abstract
Aviation safety has improved significantly through advancements in technology, regulation, and proactive risk assessment. However, challenges remain in efficiently extracting actionable safety insights from extensive, text-heavy data sources, such as incident and accident reports. Natural Language Processing (NLP) models, particularly question-answering (QA) models, offer considerable potential to enhance the accessibility and analysis of critical safety information. This paper introduces an aviation-specific QA dataset designed to support the development of NLP models capable of responding to domain-specific queries about safety incidents. The dataset was created by generating expert-crafted questions on key safety events and responses, and narratives from the National Transportation Safety Board (NTSB) and Aviation Safety Reporting System (ASRS) databases. A semi-automated approach, incorporating few-shot prompting with the Llama 3.3 70B instruction-tuned language model, facilitated answer extraction, which was then followed by human verification to ensure accuracy and relevance. The resulting QA dataset, structured to cover a wide range of safety scenarios, provides a valuable resource for training models to retrieve precise safety information in response to specific queries. Insights into the dataset’s structure, question types, and applications in risk analysis, incident investigation, and aviation safety research are presented. This work contributes to integrating NLP with aviation safety, offering a foundation for robust, safety-oriented QA model development. The dataset is available at https://huggingface.co/datasets/Timilehin674/Aviation_QA
Sponsor
Federal Aviation Administration through System Engineering and Technical Innovative Solutions (SETIS) Contract Number 693KA8-22-D-00025, Task Order Number 692M15-23-F-00178.
Date
2025-07
Extent
Resource Type
Text
Resource Subtype
Rights Statement
Unless otherwise noted, all materials are protected under U.S. Copyright Law and all rights are reserved