Exploration and Evaluation of Machine Learning-Based Models for Predicting Enzymatic Reactions.

07:00 EST 13th February 2020 | BioPortfolio

Summary of "Exploration and Evaluation of Machine Learning-Based Models for Predicting Enzymatic Reactions."

Un-annotated gene sequences in databases are increasing due to sequencing advances. Therefore, computational methods to predict functions of un-annotated genes are needed. Moreover, novel enzyme discovery for metabolic engineering applications further encourages annotation of sequences. Here, enzyme functions are predicted using two general approaches, each including several machine learning algorithms. First, Enzyme-models (E-models) predict Enzyme Commission (EC) numbers from amino acid sequence information. Second, Substrate-Enzyme models (SE-models) are built to predict substrates of enzymatic reactions together with EC numbers, and Substrate-Enzyme-Product models (SEP-models) are built to predict substrates, products and EC numbers. While accuracy of E-models is not optimal, SE-models and SEP-models predict EC numbers and reactions with high accuracy using all tested machine learning-based methods. For example, a single Random Forests-based SEP-model predicts EC first digits with an Average AUC score of over 0.94. Various metrics indicate that the current strategy of combining sequence and chemical structure information is effective at improving enzyme reaction prediction.


Journal Details

This article was published in the following journal.

Name: Journal of chemical information and modeling
ISSN: 1549-960X


