Organizational Unit:
School of Computational Science and Engineering

Research Organization Registry ID
Description
Previous Names
Parent Organization
Parent Organization
Organizational Unit
Includes Organization(s)

Publication Search Results

Now showing 1 - 1 of 1
  • Item
    Robust Representation Learning and Real-Time Serving of Deep Models for Health Time Series
    (Georgia Institute of Technology, 2023-04-25) Xu, Yanbo
    Modern Electronic Health Record (EHR) systems provide large amount of data that enables machine learning (ML) researchers to develop ML methods to improve healthcare. However, development in a clinical setting presents unique challenges in ML model training and serving. For example, EHR data are usually captured from multiple sources over time in noisy environments such as in Intensive Care Units (ICUs). As a result, data are generated in the form of time series with multiple issues including heterogeneity, missingness, irregulrity, etc. Although ML methods such as deep neural networks have been successfully developed for many predictive health tasks, improvements are still in need for learning robust and efficient predictive models to harness such multi-modal, noisy, and massive time series data. In this dissertation, we aim to tackle the following fundamental problems in developing ML models for health time series: 1. Multiple modalities in time series. Clinical time series are often generated on different devices at different frequencies. A typical ICU monitoring dataset can contain continuous signals like electrocardiogram (ECG), evenly charted tabular data like vital signs, and sparse discrete events like lab tests and medications. Simple binning methods on values can reduce rich information in dense data and mask important information in sparse data. To address this, we design an efficient ensembling algorithm for reweighting the models that are individualized for each data modality. Then for better capturing the underlying heterogeneity behind the multimodal data, we further design individualized embeddings per modality and fit self-attention Transformer on top of them for more robustly fusing the EHR time series. 2. Missing observations at random time steps. Data collection is often noisy in HER systems. Missing data or mis-timestamped data happens due to random device disconnections, patient’s body movement, human errors, etc. Models without considerations on input missingness and noises can lead to overfit and biased predictions. We incorporate stochastic differential equations into spatial temporal modeling, enabling imputations on randomly missing fields in structural time series with support of uncertainty quantification. We further propose score-based diffusion models for generating missing data and denoising the observed discrete event sequences. 3. Large unlabelled data available across different sites. True labels are expensive to obtain in clinical applications. Although input signals can be easily collected in EHR systems, many labels of interest still require manual annotations and data reviews from clinical experts retrospectively. Thus large amount of unlabelled data, which can be collected across several different hospitals, become available to researchers whereas only a few are labelled. To address this challenge, we investigate self-supervised learning in deep models and learn robust representations from the large unlabelled data that can be later adapted and fine tuned for downstream tasks. 4. Timely serving in resource-limited systems. In clinical environments such as ICUs, care practitioners need to make appropriate decision in a timely manner. Thus far deep learning models have been mainly developed for increasing prediction accuracy in heathcare, but few of them consider whether or how they can be served in real time given a resource constrained deployment environment. To bridge the gap, we design cost-aware prediction pipelines that can cascade to differently sized models for balancing between prediction accuracy and serving cost.