Track topics on Twitter Track topics that are important to you
New computational approaches for virtual screening applications are constantly being developed. However, before a particular tool is used to search for new active compounds, its effectiveness in the type of task must be examined. In this study, we conducted a detailed analysis of various aspects of preparation of respective datasets for such an evaluation. We propose a protocol for fetching data from the ChEMBL database, examine various compounds representations in terms of the possible bias resulting from the way they are generated and define a new metric for comparing the structural similarity of compounds, which is in line with chemical intuition. The newly developed method is also used for the evaluation of various approaches for division of the dataset into training and test set parts, which are also examined in detail in terms of being the source of possible results bias. Finally, machine learning methods are applied in cross-validation studies of datasets constructed within the paper, constituting benchmarks for the assessment of computational methods developed for virtual screening tasks. Additionally, analogous datasets for class A G protein-coupled receptors (100 targets with the highest number of records) were prepared. It is available at http://gmum.net/benchmarks/, together with script enabling reproduction of all results available at https://github.com/lesniak43/ananas.
This article was published in the following journal.
Name: Journal of chemical information and modeling
The combination of computational chemistry and computational materials science with machine learning and artificial intelligence provides a powerful way of relating structural features of nanomaterial...
In recent years, the importance of computational chemistry approaches has grown rapidly because of recent advances in computational software and hardware. Automated reaction path search is one of prom...
Advanced machine learning methods combined with large sets of health screening data provide opportunities for diagnostic value in human and veterinary medicine.
Recently, copy number variation (CNV) has gained considerable interest as a type of genomic variation that plays an important role in complex phenotypes and disease susceptibility. Since a number of C...
High-throughput experiments including combinatorial chemistry are useful for generating large amounts of data within a short period of time. Machine learning can be used to predict the regularity of a...
Demonstrate that the use of benchmarking improves quality of patient care, in particular the control of diabetes, lipids and blood pressure, by determining the percentage of patients in th...
An important reason for the costs of hemodialysis treatment in China are expensive is the hemodialysis machine and related products mainly rely on imports. Hemodialysis machine is the basi...
All patients admitted in Geneva University Hospitals (GUH) emergency department (ED) are triaged using the Swiss Emergency Triage Scale (SETS), a 4-level symptom-based triage scale. At the...
Machine learning methods potentially provide a highly accurate and detailed assessment of expected individual patient risk before elective cardiac surgery. Correct anticipation of this ris...
This study is a randomized, multi-center，crossover study of a domestic FM peritoneal dialysis machine and Baxter HOMECHOICE.It aims to verify safety, effectiveness and manipulability of ...
Apparatus that provides mechanical circulatory support during open-heart surgery, by passing the heart to facilitate surgery on the organ. The basic function of the machine is to oxygenate the body's venous supply of blood and then pump it back into the arterial system. The machine also provides intracardiac suction, filtration, and temperature control. Some of the more important components of these machines include pumps, oxygenators, temperature regulators, and filters. (UMDNS, 1999)
A system in which the functions of the man and the machine are interrelated and necessary for the operation of the system.
A MACHINE LEARNING paradigm used to make predictions about future instances based on a given set of labeled paired input-output training (sample) data.
A MACHINE LEARNING paradigm used to make predictions about future instances based on a given set of unlabeled paired input-output training (sample) data.
Method of measuring performance against established standards of best practice.