Identifying and generating candidate antibacterial peptides with long short-term memory recurrent neural networks

Thumbnail Image
Youmans, Michael Thomas
Qiu, Peng
Associated Organization(s)
Supplementary to
There is a growing need to deal with increasing rates of resistance to antibiotics among pathogenic bacteria. The development of resistance in bacteria to current antibiotics poses a global health hazard. Antibacterial peptides are an active area of current research that may aid in the development of new methods to deal with pathogenic bacteria. Machine learning strategies offer an efficient strategy to identify potential antibacterial peptide candidates that can accelerate what can be done through experimental testing alone. Many of the current machine learning methodologies employed to identify antibacterial peptides rely on constructing a finite length feature vector that is based on amino acid level features. Since peptides may contain different numbers of amino acids, it is not obvious how best to take amino acid features and turn them into a feature vector representing the entire peptide. Many methods for constructing such features search for periodic patterns in the amino acid level features and then use a scalar representing the strength of the periodic pattern to create feature for the whole peptide. This approach can be hit or miss as it is difficult to know which periodic patterns are relevant to the classification or regression task in advance. In this work recurrent neural networks that can take in variable length sequences of amino acid features and automatically extract a feature representation that is appropriate for the given machine learning task will be used. We explore the ability to generate random peptides and run them through a recurrent neural network trained to identify potentially interesting peptides as well as attempting to find regions of the various peptides that are responsible for their activity. Finally, we discuss the potential for sequential generative adversarial networks to generate peptides that are in theory drawn from the same distribution as training data containing real antibacterial peptides.
Date Issued
Resource Type
Resource Subtype
Rights Statement
Rights URI