Organizational Unit:
School of Interactive Computing

Research Organization Registry ID
Description
Previous Names
Parent Organization
Parent Organization
Organizational Unit
Includes Organization(s)

Publication Search Results

Now showing 1 - 1 of 1
  • Item
    Computable Phenotype and Active Learning for Acute Respiratory Distress Syndrome Classification
    (Georgia Institute of Technology, 2023-07-31) Pathak, Ashwin
    The accurate detection of acute respiratory distress syndrome (ARDS) is crucial in the Intensive Care Unit (ICU) due to its severe impact on organ function and high mortality and morbidity rates among critically ill patients. ARDS has various causes, with infection and trauma being the most common. It is characterized by poor oxygenation despite mechanical ventilation. The Berlin criteria are currently used as the gold standard for identifying ARDS, but manual adjudication of chest radiographs limits automation. Since Electronic Medical Records (EMRs) do not typically provide bilateral infiltrate information, an automated approach to detect radiological evidence would facilitate comprehensive study of the syndrome, eliminating the need for costly individual image inspections by physicians. Natural Language Processing (NLP) offers an opportunity to analyze radiology notes and determine lung status through the text. In this study, an NLP pipeline was developed to analyze radiology notes of 362 sepsis-3 criteria-fulfilling patients from the EMR, aiming to diagnose possible ARDS. After denoising and preprocessing the notes, they were vectorized using BERT word embeddings and fed into a classification layer via transfer learning. The resulting classification models achieved F1-scores of 74.5\% and 64.22\% for the Emory and Grady datasets, respectively. While large language models demonstrate excellent performance in ARDS detection, they typically require a substantial amount of training data. Active learning methods have the potential to minimize data requirements but may not consistently achieve the desired performance level. Therefore, this study thoroughly evaluates different active learning query strategies within a human-in-the-loop scenario to reduce the burden of manual adjudication. Additionally, active learning methods do not indicate when the performance target has been reached, and evaluation is challenging without a separate held-out validation dataset. Thus, the study explores the benefits of employing stopping criteria to recommend when to terminate the active learning process and assesses their effectiveness. The proposed active learning pipeline aims to continuously enhance the model's performance, resulting in an improved F1-score of 61.26% compared to random sampling baselines (59.96%), demonstrating the effectiveness of active learning methods in an imbalanced data setting.