Title:
Deep Generative Models for Drug Design

Thumbnail Image
Author(s)
Fu, Tianfan
Authors
Advisor(s)
Sun, Jimeng
Advisor(s)
Person
Editor(s)
Associated Organization(s)
Series
Supplementary to
Abstract
Machine learning in drug discovery has drawn significant attention and attracted explosive growth in drug discovery and development research. This dissertation studies the deep generative methods in drug design. Despite the rapid progress of machine learning, especially deep learning in drug discovery, the existing drug design methods remain challenging for real-world applications in both categories of the methods from different aspects, including sample efficiency and data requirement, which are summarized as follows. • Sample efficiency. Existing drug optimization methods rely heavily on brute-force trial-and-error strategy and suffer from poor sample efficiency. A sample-efficient drug design method would save much time and computational resources. • Data efficiency. Acquiring data labels (e.g., drugs’ property) is typically laborious and time-consuming in drug discovery because it involves bioassay based wet-lab experiments, or animal models. This dissertation focuses on addressing these challenges by enhancing/designing the following categories of deep generative models: • Enhancing graph-to-graph neural architecture. Graph-to-graph neural architecture is used in drug design to translate a molecule to another similar molecule with property improvement. We design copy & refine (CORE) strategy [1] and Molecule Reward in deep generative models (MOLER) [2] that leverages policy gradient of reinforcement learning. Graph-to-graph methods are easy to train in end-to-end manner and do not require an online oracle query. However, it suffers from data- and sample- inefficiency. • Self-supervised learning (SSL) for generation. SSL can be pretrained in large unlabeled data, alleviating the high demand for labeled data. During the generation process, SSL masks a subset of the whole drug molecule and samples the masked part based on deep neural network’s prediction. It can be applied on both small-molecule drugs (Multi-objective molecule sampling (MIMOSA) [3]) and biologics design (sampling method for inverse protein folding (SIPF) [4]). The pros are that self-supervised learning based generation can quantify the uncertainty and be data-efficient. However, it suffers from sample inefficiency. • Differentiable programming for generation. The discrete drug molecules are relaxed to differentiable ones on continuous space, so the gradient of neural network can be back-propagated to update the differentiable drug molecules directly. The strategy can be also applied to both small-molecule drug (differentiable scaffolding tree (DST) [5]) and biologics (constrained energy model (CEM) [6]). Differentiable programming is data- and sample-efficient. However, it still requires online oracle query. • Intelligent combinatorial optimization. Traditional combinatorial optimization methods such as genetic algorithms (GA) rely heavily on a random-walk-like exploration, which leads to unstable performance. To address this challenge, we propose a Reinforced Genetic Algorithm (RGA) that uses neural models to prioritize the profitable design steps. Intelligent combinatorial optimization suppresses random-walk behavior and enhance efficiency [7]. However, it still requires online oracle query. In the last chapter, we describe future works to extend the current research. First, we will build some hybrid models to inherit the advantages of multiple categories of generative methods. Second, we will conduct comprehensive experiments to systematically compare these generative methods. [1] Tianfan Fu, Cao Xiao, Jimeng Sun: CORE: Automatic Molecule Optimization Using Copy and Refine Strategy. Association for the Advancement of Artificial Intelligence (AAAI) 2020. [2] Tianfan Fu, Cao Xiao, Lucas Glass, Jimeng Sun: MOLER: Incorporate Molecule-Level Reward to Enhance Deep Generative Model for Molecule Optimization. IEEE Transactions on Knowledge and Data Engineering (TKDE) 2021. [3] Tianfan Fu, Cao Xiao, Xinhao Li, Lucas Glass, Jimeng Sun: MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization. Association for the Advancement of Artificial Intelligence (AAAI) 2021. [4] Tianfan Fu, Jimeng Sun. SIPF: Sampling Method for Inverse Protein Folding. The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2022). [5] Tianfan Fu*, Wenhao Gao*, Cao Xiao, Jacob Yasonik, Connor W. Coley, Jimeng Sun. Differentiable Scaffolding Tree for Molecular Optimization. International Conference on Learning Representation (ICLR), 2022. [6] Tianfan Fu, Jimeng Sun. Antibody Complementarity Determining Regions (CDRs) design using Constrained Energy Model. The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2022). [7] Tianfan Fu*, Wenhao Gao*, Connor W. Coley, Jimeng Sun. Reinforced genetic algorithm for structurebased drug design, NeurIPS, 2022.
Sponsor
Date Issued
2023-04-21
Extent
Resource Type
Text
Resource Subtype
Dissertation
Rights Statement
Rights URI