Title:
Standardization of Engineering Requirements using Large Language Models

dc.contributor.advisor Mavris, Dimitri N.
dc.contributor.author Tikayat Ray, Archana
dc.contributor.committeeMember Pinon Fischer, Olivia J.
dc.contributor.committeeMember Schrage, Daniel P.
dc.contributor.committeeMember Cole, Bjorn F.
dc.contributor.committeeMember White, Ryan T.
dc.contributor.department Aerospace Engineering
dc.date.accessioned 2023-05-18T17:54:32Z
dc.date.available 2023-05-18T17:54:32Z
dc.date.created 2023-05
dc.date.issued 2023-04-27
dc.date.submitted May 2023
dc.date.updated 2023-05-18T17:54:33Z
dc.description.abstract Requirements serve as the foundation for all systems, products, services, and enterprises. A well-formulated requirement conveys information, which must be necessary, clear, traceable, verifiable, and complete to respective stakeholders. Various types of requirements like functional, non-functional, design, quality, performance, and certification requirements are used to define system functions/objectives based on the domain of interest and the system being designed. Organizations predominantly use natural language (NL) for requirements elicitation since it is easy to understand and use by stakeholders with varying levels of experience. In addition, NL lowers the barrier to entry when compared to model-based languages such as Unified Modeling Language (UML) and Systems Modeling Language (SysML), which require training. Despite these advantages, NL requirements bring along many drawbacks such as ambiguities associated with language, a tedious and error-prone manual examination process, difficulties associated with verifying requirements completeness, and failure to recognize and use technical terms effectively. While the drawbacks associated with using NL for requirements engineering are not limited to a single domain or industry, the focus of this dissertation will be on aerospace requirements. Most of the systems in the present-day world are complex and warrant an integrated and holistic approach to their development to capture the numerous interrelationships. To address this need, there has been a paradigm shift towards a model-centric approach to engineering as compared to traditional document-based methods. The promise shown by the model-centric approach is huge, however, the conversion of NL requirements into models is hindered by the ambiguities and inconsistencies in NL requirements. As such, the objective of this dissertation is to identify, develop, and implement tools and techniques to enable/support the automated translation of NL requirements into standardized/semi-machine-readable requirements. Bidirectional Encoder Representations from Transformers (BERT), a transformer-based language model (LM), was selected for this research because 1) it can be fine-tuned for a variety of language tasks such as Named-entity recognition (NER), parts-of-speech (POS) tagging, and sentence classification, 2) can achieve State-of-the-art (SOTA) results. In addition, it uses a bidirectional transformer-based architecture enabling it to better capture the context in a sentence. BERT is pre-trained on BookCorpus and English Wikipedia (general-domain text) and as a result, needs to be fine-tuned using an aerospace corpus to be able to generalize to the aerospace domain. To fine-tune BERT for different NLP tasks, two annotated aerospace corpora were created. These corpora contain text from Parts 23 and 25 of Title 14 of the Code of Federal Regulations (CFRs) and publications by the National Academy of Space Studies Board. Both corpora were open-sourced to make them available to other researchers to accelerate research in the field of Natural Language Processing for Requirements Engineering (NLP4RE). First, the corpus annotated for aerospace-specific named entities (NEs), was used to fine-tune different variants of the BERT LM for the identification of five categories of named entities, namely, system names (SYS), resources (RES), values (VAL), organization names (ORG), and datetime (DATETIME). The extracted named entities were used to create a glossary, which is expected to improve the quality and understandability of aerospace requirements by ensuring uniform use of terminologies. Second, the corpus annotated for aerospace requirements classification was used to fine-tune BERT LM to classify requirements into different types such as design requirements, functional requirements, and performance requirements. Being able to classify requirements will improve the ability to conduct redundancy checks, evaluate consistency, and identify boilerplates, which are pre-defined linguistic patterns for standardizing requirements. Third, an off-the-shelf model flair/chunk-english was used for identifying the different sentence chunks in a requirement sentence, which is helpful for ordering phrases in a sentence and hence useful for the standardization of requirements. The capability to classify requirements, identify named entities occurring in requirements, and extract different sentence chunks in aerospace requirements, facilitated the creation of requirements table and boilerplates for the conversion of NL requirements into semi-machine-readable requirements. Based on the frequency of different linguistic patterns, boilerplates were constructed for various types of requirements. In summary, this effort resulted in the development of the first open-source annotated aerospace corpora along with two LMs (aeroBERT-NER, and aeroBERT-Classifier). Various methodologies were developed to use the fine-tuned LMs to standardize requirements by making use of requirements boilerplates. As a result, this research will eventually contribute to speeding up the design and development process by reducing ambiguities and inconsistencies associated with requirements. In addition, it is expected to help reduce the workload on engineers who manually evaluate a large number of requirements by facilitating the conversion of NL aerospace requirements into standardized requirements.
dc.description.degree Ph.D.
dc.format.mimetype application/pdf
dc.identifier.uri https://hdl.handle.net/1853/72044
dc.language.iso en_US
dc.publisher Georgia Institute of Technology
dc.subject Requirement Engineering
dc.subject Natural Language Processing
dc.subject BERT
dc.subject Large Language Models
dc.subject Aerospace Engineering
dc.subject System Engineering
dc.title Standardization of Engineering Requirements using Large Language Models
dc.type Text
dc.type.genre Dissertation
dspace.entity.type Publication
local.contributor.advisor Mavris, Dimitri N.
local.contributor.corporatename College of Engineering
local.contributor.corporatename Daniel Guggenheim School of Aerospace Engineering
local.relation.ispartofseries Doctor of Philosophy with a Major in Aerospace Engineering
relation.isAdvisorOfPublication d355c865-c3df-4bfe-8328-24541ea04f62
relation.isOrgUnitOfPublication 7c022d60-21d5-497c-b552-95e489a06569
relation.isOrgUnitOfPublication a348b767-ea7e-4789-af1f-1f1d5925fb65
relation.isSeriesOfPublication f6a932db-1cde-43b5-bcab-bf573da55ed6
thesis.degree.level Doctoral
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
TIKAYATRAY-DISSERTATION-2023.pdf
Size:
16.99 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
3.87 KB
Format:
Plain Text
Description: