Accurate Prokaryotic Gene Annotation using Structure-aware Protein Language Models

Author(s)
Tu, Tony
Advisor(s)
Aghazadeh, Amirali
Kumar, Srijan
Editor(s)
Associated Organization(s)
Organizational Unit
Supplementary to:
Abstract
Prokaryotic gene annotation is an important problem in the intersection of computer sci- ence and computational biology today. Gene annotation involves identifying the loca- tion of genes within the genome, determining the start and end points of the coding regions, and predicting the functions of the encoded proteins. This information is important for understanding the biological processes and pathways that are involved in prokaryotic metabolism, growth, and adaptation to different environments. From a larger perspective, prokaryotic gene annotation plays a crucial role in understanding the biology of prokaryotes and their interactions with their environment, and therefore has important practical applications in fields such as medicine and biotechnology. We introduce a novel algorithm - ProtiGeno, which improves the prokaryotic gene annotation tools with protein language models.
Sponsor
Date
2023-05-08
Extent
Resource Type
Text
Resource Subtype
Thesis
Rights Statement
Rights URI