Data-Efficient Machine Learning Methods in Healthcare Applications

Author(s)
Zheng, Zhiyang
Advisor(s)
Editor(s)
Associated Organization(s)
Series
Supplementary to:
Abstract
Machine learning (ML) offers transformative potential in healthcare, but its application is frequently constrained by data efficiency challenges, including the difficulty of acquiring large labeled datasets, privacy regulations limiting data sharing, and the burden of extensive multi-modal data collection. This dissertation introduces and validates novel data-efficient ML methodologies designed to address these limitations within specific medical contexts: dental cone-beam computed tomography (CBCT) image analysis and Alzheimer's disease (AD) prediction, tackling issues of limited labeled data, potentially unnecessary multi-modal acquisitions, and privacy-preserving learning with incomplete, distributed datasets. Firstly, to overcome limited labeled data in dental CBCT segmentation, an anatomically-constrained Dense U-Net was developed; this method integrates explicit anatomical knowledge as mathematical constraints within a regularized optimization framework, yielding superior segmentation and lesion detection accuracy compared to standard deep learning models, even when trained on minimal patient data. Secondly, addressing the burden of multi-modal data acquisition for predicting AD progression from mild cognitive impairment (MCI), the Uncertainty-aware Multi-modal Outcome Sequential (UMOS) framework was introduced, which adaptively acquires modalities based on predictive uncertainty, achieving comparable accuracy to full-modality models while significantly reducing the need for costly scans like PET for a majority of patients. Thirdly, to enable collaborative learning on incomplete multi-modal data distributed across sites while preserving privacy, the Federated Variational Inference Cross-Modal Learning (FedVICML) framework was proposed; combining federated learning with variational inference, FedVICML allows sites to share probabilistic latent parameters, effectively handling missing modalities and improving predictive performance over isolated training, demonstrated on ADNI data. Collectively, these studies present innovative strategies—integrating domain knowledge, implementing adaptive data acquisition, and enabling privacy-preserving federated learning—that substantially enhance ML data efficiency in healthcare, offering practical solutions for improving diagnostic and predictive accuracy in dentistry and neurology while mitigating critical challenges related to data scarcity, cost, and privacy.
Sponsor
Date
2025-05-08
Extent
Resource Type
Text
Resource Subtype
Dissertation
Rights Statement
Rights URI