Standardization of Engineering Requirements using Large Language Models

Thumbnail Image
Tikayat Ray, Archana
Mavris, Dimitri N.
Associated Organization(s)
Supplementary to
Requirements serve as the foundation for all systems, products, services, and enterprises. A well-formulated requirement conveys information, which must be necessary, clear, traceable, verifiable, and complete to respective stakeholders. Various types of requirements like functional, non-functional, design, quality, performance, and certification requirements are used to define system functions/objectives based on the domain of interest and the system being designed. Organizations predominantly use natural language (NL) for requirements elicitation since it is easy to understand and use by stakeholders with varying levels of experience. In addition, NL lowers the barrier to entry when compared to model-based languages such as Unified Modeling Language (UML) and Systems Modeling Language (SysML), which require training. Despite these advantages, NL requirements bring along many drawbacks such as ambiguities associated with language, a tedious and error-prone manual examination process, difficulties associated with verifying requirements completeness, and failure to recognize and use technical terms effectively. While the drawbacks associated with using NL for requirements engineering are not limited to a single domain or industry, the focus of this dissertation will be on aerospace requirements. Most of the systems in the present-day world are complex and warrant an integrated and holistic approach to their development to capture the numerous interrelationships. To address this need, there has been a paradigm shift towards a model-centric approach to engineering as compared to traditional document-based methods. The promise shown by the model-centric approach is huge, however, the conversion of NL requirements into models is hindered by the ambiguities and inconsistencies in NL requirements. As such, the objective of this dissertation is to identify, develop, and implement tools and techniques to enable/support the automated translation of NL requirements into standardized/semi-machine-readable requirements. Bidirectional Encoder Representations from Transformers (BERT), a transformer-based language model (LM), was selected for this research because 1) it can be fine-tuned for a variety of language tasks such as Named-entity recognition (NER), parts-of-speech (POS) tagging, and sentence classification, 2) can achieve State-of-the-art (SOTA) results. In addition, it uses a bidirectional transformer-based architecture enabling it to better capture the context in a sentence. BERT is pre-trained on BookCorpus and English Wikipedia (general-domain text) and as a result, needs to be fine-tuned using an aerospace corpus to be able to generalize to the aerospace domain. To fine-tune BERT for different NLP tasks, two annotated aerospace corpora were created. These corpora contain text from Parts 23 and 25 of Title 14 of the Code of Federal Regulations (CFRs) and publications by the National Academy of Space Studies Board. Both corpora were open-sourced to make them available to other researchers to accelerate research in the field of Natural Language Processing for Requirements Engineering (NLP4RE). First, the corpus annotated for aerospace-specific named entities (NEs), was used to fine-tune different variants of the BERT LM for the identification of five categories of named entities, namely, system names (SYS), resources (RES), values (VAL), organization names (ORG), and datetime (DATETIME). The extracted named entities were used to create a glossary, which is expected to improve the quality and understandability of aerospace requirements by ensuring uniform use of terminologies. Second, the corpus annotated for aerospace requirements classification was used to fine-tune BERT LM to classify requirements into different types such as design requirements, functional requirements, and performance requirements. Being able to classify requirements will improve the ability to conduct redundancy checks, evaluate consistency, and identify boilerplates, which are pre-defined linguistic patterns for standardizing requirements. Third, an off-the-shelf model flair/chunk-english was used for identifying the different sentence chunks in a requirement sentence, which is helpful for ordering phrases in a sentence and hence useful for the standardization of requirements. The capability to classify requirements, identify named entities occurring in requirements, and extract different sentence chunks in aerospace requirements, facilitated the creation of requirements table and boilerplates for the conversion of NL requirements into semi-machine-readable requirements. Based on the frequency of different linguistic patterns, boilerplates were constructed for various types of requirements. In summary, this effort resulted in the development of the first open-source annotated aerospace corpora along with two LMs (aeroBERT-NER, and aeroBERT-Classifier). Various methodologies were developed to use the fine-tuned LMs to standardize requirements by making use of requirements boilerplates. As a result, this research will eventually contribute to speeding up the design and development process by reducing ambiguities and inconsistencies associated with requirements. In addition, it is expected to help reduce the workload on engineers who manually evaluate a large number of requirements by facilitating the conversion of NL aerospace requirements into standardized requirements.
Date Issued
Resource Type
Resource Subtype
Rights Statement
Rights URI