Evaluation of Cross-Validation Strategies in Sequence-Based Binding Prediction Using Deep Learning.

07:00 EST 7th February 2019 | BioPortfolio

Summary of "Evaluation of Cross-Validation Strategies in Sequence-Based Binding Prediction Using Deep Learning."

Binding prediction between targets and drug-like compounds through Deep Neural Networks have generated promising results in recent years, outperforming traditional machine learning-based methods. However, the generalization capability of these classification models is still an issue to be addressed. In this work, we explored how different cross-validation strategies applied to data from different molecular databases affect to the performance of binding prediction proteochemometrics models. These strategies are: (1) random splitting, (2) splitting based on K-means clustering (both of actives and inactives), (3) splitting based on source database and (4) splitting based both in the clustering and in the source database. These schemas are applied to a Deep Learning proteochemometrics model and to a simple logistic regression model to be used as baseline. Additionally, two different ways of describing molecules in the model are tested: (1) by their SMILES and (2) by three fingerprints. The classification performance of our Deep Learning-based proteochemometrics model is comparable to the state of the art. Our results show that the lack of generalization of these models is due to a bias in public molecular databases and that a restrictive cross-validation schema based on compounds clustering leads to worse but more robust and credible results. Our results also show better performance when representing molecules by their fingerprints.


Journal Details

This article was published in the following journal.

Name: Journal of chemical information and modeling
ISSN: 1549-960X


DeepDyve research library

PubMed Articles [36968 Associated PubMed Articles listed on BioPortfolio]

Sequence-based analysis and prediction of lantibiotics: A machine learning approach.

Lantibiotics, an important group of ribosomally synthesized peptides, represent an important arsenal of novel promising antimicrobials showing high potency in fighting against the prevalence of antibi...

MIonSite: Ligand-specific prediction of metal ion-binding sites via enhanced AdaBoost algorithm with protein sequence information.

Accurately targeting metal ion-binding sites solely from protein sequences is valuable for both basic experimental biology and drug discovery studies. Although considerable progress has been made, met...

Structure-based methods for binding mode and binding affinity prediction for peptide-MHC complexes.

Understanding the mechanisms involved in the activation of an immune response is essential to many fields in human health, including vaccine development and personalized cancer immunotherapy. A centra...

A decision tree-based integrated testing strategy for tailor-made carcinogenicity evaluation of test substances using genotoxicity test results and chemical spaces.

Genotoxicity evaluation has been widely used to estimate the carcinogenicity of test substances during safety evaluation. However, the latest strategies using genotoxicity tests give more weight to se...

Algebraic shortcuts for leave-one-out cross-validation in supervised network inference.

Supervised machine learning techniques have traditionally been very successful at reconstructing biological networks, such as protein-ligand interaction, protein-protein interaction and gene regulator...

Clinical Trials [11682 Associated Clinical Trials listed on BioPortfolio]

Validation of the French Version of the Xerostomia Inventory

The main objective of this study is to achieve cross-cultural and psychometric validation of the Xerostomia Inventory initially developed in English language into French Language. This wil...

Rapid Sequence Induction EU: Electronic Survey (RSIEU)

Rapid sequence induction (RSI) is a common part of routine anesthesiology practice. However several steps of RSI are not based on evidence based data (EBM) and are considered controversial...

Clinical Validation Study of Multi-EPI Mix

This study aims to assess the diagnostic validity of a new minute-MRI sequence for neuroradiological evaluation in comparison to conventional MRI.

Biomarkers of Meats and Potatoes Intake.

The discovery of biomarkers for the intake of meats and potatoes is needed for an accurate assessment of the intake of these foods. Twelve healthy subjects were enrolled in a controlled, c...

Alternate Methodology of Pulse Oximeter Validation

This study will determine if the replacement of the measured arterial blood oxygen saturation with expired (end-tidal) oxygen value is an acceptable method to calculate the accuracy of pul...

Medical and Biotech [MESH] Definitions

A prediction of the probable outcome of a disease based on a individual's condition and the usual course of the disease as seen in similar situations.

Validation of the sex of an individual by means of the bones of the SKELETON. It is most commonly based on the appearance of the PELVIS; SKULL; STERNUM; and/or long bones.

A subfamily of transmembrane proteins from the superfamily of ATP-BINDING CASSETTE TRANSPORTERS that are closely related in sequence to ATP-BINDING CASSETTE, SUB-FAMILY B, MEMBER 1. When overexpressed, they function as ATP-dependent efflux pumps able to extrude lipophilic drugs, especially ANTINEOPLASTIC AGENTS, from cells causing multidrug resistance (DRUG RESISTANCE, MULTIPLE). Although ATP BINDING CASSETTE TRANSPORTER, SUB-FAMILY B share functional similarities to MULTIDRUG RESISTANCE-ASSOCIATED PROTEINS they are two distinct subclasses of ATP-BINDING CASSETTE TRANSPORTERS, and have little sequence homology.

The prediction or projection of the nature of future problems or existing conditions based upon the extrapolation or interpretation of existing scientific data or by the application of scientific methodology.

Predicting the time of OVULATION can be achieved by measuring the preovulatory elevation of ESTRADIOL; LUTEINIZING HORMONE or other hormones in BLOOD or URINE. Accuracy of ovulation prediction depends on the completeness of the hormone profiles, and the ability to determine the preovulatory LH peak.

Quick Search


DeepDyve research library

Relevant Topics

Drug Discovery
Clinical Approvals Clinical Trials Drug Approvals Drug Delivery Drug Discovery Generics Drugs Prescription Drugs In the fields of medicine, biotechnology and pharmacology, drug discovery is the process by which drugs are dis...

Antiretroviral therapy
Standard antiretroviral therapy (ART) consists of the combination of at least three antiretroviral (ARV) drugs to maximally suppress the HIV virus and stop the progression of HIV disease. Huge reductions have been seen in rates of death and suffering whe...

Searches Linking to this Article