KG-Enhanced Synthetic Report Generation for Addressing Class Imbalance in Aviation Safety Data
Author(s)
Jing, Xiao
Bhanpato, Jirat
Bendarkar, Mayank V.
Mavris, Dimitri N.
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
This study presents a novel approach to address class imbalance in aviation safety data through knowledge-enhanced synthetic report generation. To overcome the limitations of existing methods in handling imbalanced event categories, we propose a two-part solution combining a fine-tuned large language model (LLM) with a custom-built aircraft knowledge graph constructed from FAA data. Our approach introduces a dynamic retrieval and prompting strategy that effectively reduces hallucination while maintaining lower computational overhead compared to traditional Graph-RAG methods. The generated reports demonstrate the improvement in technical accuracy, particularly in aircraft specifications and operational details, while maintaining high adherence to NTSB reporting formats. This work advances both the methodological aspects of domain-specific text generation and practical applications in aviation safety analysis through improved data balance.
Sponsor
This work was sponsored by the Federal Aviation Administration through the System Engineering and Technical Innovative Solutions (SETIS) Contract No. 693KA8-22-D-00025, Task Order No. 692M15-23-F-00178.
Date
2025-07-16
Extent
Resource Type
Text
Resource Subtype
Rights Statement
Unless otherwise noted, all materials are protected under U.S. Copyright Law and all rights are reserved