Title:
Deep Generative Models for Drug Design

dc.contributor.advisor Sun, Jimeng
dc.contributor.author Fu, Tianfan
dc.contributor.committeeMember Coley, Connor
dc.contributor.committeeMember Luo, Yunan
dc.contributor.committeeMember Zhang, Xiuwei
dc.contributor.committeeMember Zitnik, Marinka
dc.contributor.department Computational Science and Engineering
dc.date.accessioned 2023-05-18T17:50:55Z
dc.date.available 2023-05-18T17:50:55Z
dc.date.created 2023-05
dc.date.issued 2023-04-21
dc.date.submitted May 2023
dc.date.updated 2023-05-18T17:50:56Z
dc.description.abstract Machine learning in drug discovery has drawn significant attention and attracted explosive growth in drug discovery and development research. This dissertation studies the deep generative methods in drug design. Despite the rapid progress of machine learning, especially deep learning in drug discovery, the existing drug design methods remain challenging for real-world applications in both categories of the methods from different aspects, including sample efficiency and data requirement, which are summarized as follows. • Sample efficiency. Existing drug optimization methods rely heavily on brute-force trial-and-error strategy and suffer from poor sample efficiency. A sample-efficient drug design method would save much time and computational resources. • Data efficiency. Acquiring data labels (e.g., drugs’ property) is typically laborious and time-consuming in drug discovery because it involves bioassay based wet-lab experiments, or animal models. This dissertation focuses on addressing these challenges by enhancing/designing the following categories of deep generative models: • Enhancing graph-to-graph neural architecture. Graph-to-graph neural architecture is used in drug design to translate a molecule to another similar molecule with property improvement. We design copy & refine (CORE) strategy [1] and Molecule Reward in deep generative models (MOLER) [2] that leverages policy gradient of reinforcement learning. Graph-to-graph methods are easy to train in end-to-end manner and do not require an online oracle query. However, it suffers from data- and sample- inefficiency. • Self-supervised learning (SSL) for generation. SSL can be pretrained in large unlabeled data, alleviating the high demand for labeled data. During the generation process, SSL masks a subset of the whole drug molecule and samples the masked part based on deep neural network’s prediction. It can be applied on both small-molecule drugs (Multi-objective molecule sampling (MIMOSA) [3]) and biologics design (sampling method for inverse protein folding (SIPF) [4]). The pros are that self-supervised learning based generation can quantify the uncertainty and be data-efficient. However, it suffers from sample inefficiency. • Differentiable programming for generation. The discrete drug molecules are relaxed to differentiable ones on continuous space, so the gradient of neural network can be back-propagated to update the differentiable drug molecules directly. The strategy can be also applied to both small-molecule drug (differentiable scaffolding tree (DST) [5]) and biologics (constrained energy model (CEM) [6]). Differentiable programming is data- and sample-efficient. However, it still requires online oracle query. • Intelligent combinatorial optimization. Traditional combinatorial optimization methods such as genetic algorithms (GA) rely heavily on a random-walk-like exploration, which leads to unstable performance. To address this challenge, we propose a Reinforced Genetic Algorithm (RGA) that uses neural models to prioritize the profitable design steps. Intelligent combinatorial optimization suppresses random-walk behavior and enhance efficiency [7]. However, it still requires online oracle query. In the last chapter, we describe future works to extend the current research. First, we will build some hybrid models to inherit the advantages of multiple categories of generative methods. Second, we will conduct comprehensive experiments to systematically compare these generative methods. [1] Tianfan Fu, Cao Xiao, Jimeng Sun: CORE: Automatic Molecule Optimization Using Copy and Refine Strategy. Association for the Advancement of Artificial Intelligence (AAAI) 2020. [2] Tianfan Fu, Cao Xiao, Lucas Glass, Jimeng Sun: MOLER: Incorporate Molecule-Level Reward to Enhance Deep Generative Model for Molecule Optimization. IEEE Transactions on Knowledge and Data Engineering (TKDE) 2021. [3] Tianfan Fu, Cao Xiao, Xinhao Li, Lucas Glass, Jimeng Sun: MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization. Association for the Advancement of Artificial Intelligence (AAAI) 2021. [4] Tianfan Fu, Jimeng Sun. SIPF: Sampling Method for Inverse Protein Folding. The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2022). [5] Tianfan Fu*, Wenhao Gao*, Cao Xiao, Jacob Yasonik, Connor W. Coley, Jimeng Sun. Differentiable Scaffolding Tree for Molecular Optimization. International Conference on Learning Representation (ICLR), 2022. [6] Tianfan Fu, Jimeng Sun. Antibody Complementarity Determining Regions (CDRs) design using Constrained Energy Model. The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2022). [7] Tianfan Fu*, Wenhao Gao*, Connor W. Coley, Jimeng Sun. Reinforced genetic algorithm for structurebased drug design, NeurIPS, 2022.
dc.description.degree Ph.D.
dc.format.mimetype application/pdf
dc.identifier.uri https://hdl.handle.net/1853/72001
dc.publisher Georgia Institute of Technology
dc.subject drug design
dc.subject deep generative model
dc.title Deep Generative Models for Drug Design
dc.type Text
dc.type.genre Dissertation
dspace.entity.type Publication
local.contributor.advisor Sun, Jimeng
local.contributor.corporatename College of Computing
local.contributor.corporatename School of Computational Science and Engineering
relation.isAdvisorOfPublication 8b48f6d2-28c9-4413-8148-531f91a7e5f9
relation.isOrgUnitOfPublication c8892b3c-8db6-4b7b-a33a-1b67f7db2021
relation.isOrgUnitOfPublication 01ab2ef1-c6da-49c9-be98-fbd1d840d2b1
thesis.degree.level Doctoral
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
FU-DISSERTATION-2023.pdf
Size:
2.79 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
3.86 KB
Format:
Plain Text
Description: