Automated extraction and synthesis of biomedical data for AI-driven systematic review and meta-analysis

Loading...
Thumbnail Image
Author(s)
Kartchner, David
Editor(s)
Associated Organization(s)
Organizational Unit
Organizational Unit
School of Computational Science and Engineering
School established in May 2010
Supplementary to:
Abstract
Biomedical literature is not simply a record of scientific discovery; it also provides a platform for research exploration and optimized clinical practice. The purpose of this thesis is to utilize and develop natural language processing methods to enhance and automate biomedical literature-based research inquiry. Specifically, we develop datasets, methods, and systems to enable AI-assisted systematic review and meta-analysis of clinical literature. We further validate its efficacy via several clinical case studies that demonstrate its value in identifying potential treatments for emerging diseases and elucidating the mechanisms by which diseases affect patients. Qualitative systematic reviews perform a thorough survey of a particular medical topic to highlight relevant relationships and highlight promising directions for future research. To enable faster systematic review of biomedical relationships, we build a knowledge graph of relationships between biomedical entities extracted from 33+ million research articles on PubMed. We pair this with an unsupervised graph ranking algorithm that identifies related concepts and their relationships from literature. This graph and accompanying software package form a Literature Based Discovery (LBD) system that can comprehensively identify and rank disease risks, mechanisms, and repurposed drugs for future clinical or experimental research prioritization. Similarly, quantitative meta-analysis of clinical studies forms the gold standard for establishing clinical guidelines and best practice by calculating an aggregate effect size from a collection of smaller cohorts. Meta-analysis begins with a specific research question and then extracts study-specific data elements to form a large, synthetic statistical cohort. Currently, the process of selecting research articles and extracting relevant data is done manually, taking a year on average for each clinical meta-analysis. This thesis presents data and methodological resources that dramatically accelerates the process of qualitatively and quantitatively aggregating evidence from biomedical research. In doing so, we provided the following contributions: • We developed SemNet 2.0, a literature-based discovery software that integrates 33+ million PubMed articles into a comprehensive knowledge graph using named entity recognition, entity linking, and relationship extraction. We performed real-world case studies to illustrate the efficacy of SemNet 2.0 for summarizing relationships and prioritizing future experimental and clinical research. • We meticulously annotated data resources -- BioSift and TrialSieve -- that enable efficient filtering of clinical studies and detailed extraction of study design and outcome information. Specifically, TrialSieve is the first dataset to our knowledge that enables the automated quantification of clinical outcomes for each group represented in a clinical study. • We developed an interface to enable real-time, human-in-the-loop identification, filtering, and information extraction from clinical trials using large language models. • We demonstrated the translation potential of our developed platform by creating a large database of clinical evidence for over 100 commonly used drugs with high potential to improve therapeutic outcomes for numerous types of cancer. The deliverables of this thesis comprised seven published journal articles or conference proceedings and one under-review conference proceeding authored by David Kartchner. Specifically, this thesis included four high-quality biomedical or information science journal articles and four top-tier conference papers.
Sponsor
Date
2023-12-07
Extent
Resource Type
Text
Resource Subtype
Dissertation
Rights Statement
Rights URI