Track topics on Twitter Track topics that are important to you
Un-annotated gene sequences in databases are increasing due to sequencing advances. Therefore, computational methods to predict functions of un-annotated genes are needed. Moreover, novel enzyme discovery for metabolic engineering applications further encourages annotation of sequences. Here, enzyme functions are predicted using two general approaches, each including several machine learning algorithms. First, Enzyme-models (E-models) predict Enzyme Commission (EC) numbers from amino acid sequence information. Second, Substrate-Enzyme models (SE-models) are built to predict substrates of enzymatic reactions together with EC numbers, and Substrate-Enzyme-Product models (SEP-models) are built to predict substrates, products and EC numbers. While accuracy of E-models is not optimal, SE-models and SEP-models predict EC numbers and reactions with high accuracy using all tested machine learning-based methods. For example, a single Random Forests-based SEP-model predicts EC first digits with an Average AUC score of over 0.94. Various metrics indicate that the current strategy of combining sequence and chemical structure information is effective at improving enzyme reaction prediction.
This article was published in the following journal.
Name: Journal of chemical information and modeling
Machine learning has been increasingly used to develop predictive models to diagnose different disease conditions. The heterogeneity of the kidney transplant population makes predicting graft outcomes...
Osteoporosis is hard to detect before it manifests symptoms and complications. In this study, we evaluated machine learning models for identifying individuals with abnormal bone mineral density (BMD) ...
The results of machine learning models can often be difficult to interpret, especially for domain experts. Audio Explorer, the winning entry of the 2018 VAST Challenge, is an interactive data explorat...
Postoperative gastrointestinal leak and venous thromboembolism (VTE) are devastating complications of bariatric surgery. The performance of currently available predictive models for these complication...
Noncovalent inhibitors of protein kinases have different modes of action. They bind to the active or inactive form of kinases, compete with ATP, stabilize inactive kinase conformations, or act through...
The aim of this study is to get a proof of concept for using a computational model of fetal haemodynamics, combined with machine learning based on Doppler patterns of the fetal cardiovascu...
Patients with Chronic Obstructive Pulmonary Disease (COPD) who are admitted to hospital are at high risk of readmission. While therapies have improved and there are evidence-based guidelin...
Investigating glucose response to Mediterranean and regular diets in healthy children in order to develop specific pediatric machine-learning for predicting the personalized glucose respon...
Patients undergoing cytoreductive surgery with hyperthermic intraoperative chemotherapy (CRS with HIPEC) are prone to postoperative kidney dysfunction. Previous models predicting kidney in...
This study will be collecting data on participants undergoing lower body negative pressure (LBNP) to simulate progressive blood loss. The goal of the study is to collect data to allow for ...
A MACHINE LEARNING paradigm used to make predictions about future instances based on a given set of labeled paired input-output training (sample) data.
A MACHINE LEARNING paradigm used to make predictions about future instances based on a given set of unlabeled paired input-output training (sample) data.
SUPERVISED MACHINE LEARNING algorithm which learns to assign labels to objects from a set of training examples. Examples are learning to recognize fraudulent credit card activity by examining hundreds or thousands of fraudulent and non-fraudulent credit card activity, or learning to make disease diagnosis or prognosis based on automatic classification of microarray gene expression profiles drawn from hundreds or thousands of samples.
Usually refers to the use of mathematical models in the prediction of learning to perform tasks based on the theory of probability applied to responses; it may also refer to the frequency of occurrence of the responses observed in the particular study.
A type of ARTIFICIAL INTELLIGENCE that enable COMPUTERS to independently initiate and execute LEARNING when exposed to new data.
Bioinformatics is the application of computer software and hardware to the management of biological data to create useful information. Computers are used to gather, store, analyze and integrate biological and genetic information which can then be applied...
DNA sequencing is the process of determining the precise order of nucleotides within a DNA molecule. During DNA sequencing, the bases of a small fragment of DNA are sequentially identified from signals emitted as each fragment is re-synthesized from a ...
Enzymes are proteins that catalyze (i.e., increase the rates of) chemical reactions. In enzymatic reactions, the molecules at the beginning of the process, called substrates, are converted into different molecules, called products. Almost all chemical re...