iProEP: A Computational Predictor for Predicting Promoter.

08:00 EDT 13th June 2019 | BioPortfolio

Summary of "iProEP: A Computational Predictor for Predicting Promoter."

Promoter is a fundamental DNA element located around the transcription start site (TSS) and could regulate gene transcription. Promoter recognition is of great significance in determining transcription units, studying gene structure, analyzing gene regulation mechanisms, and annotating gene functional information. Many models have already been proposed to predict promoters. However, the performances of these methods still need to be improved. In this work, we combined pseudo k-tuple nucleotide composition (PseKNC) with position-correlation scoring function (PCSF) to formulate promoter sequences of Homo sapiens (H. sapiens), Drosophila melanogaster (D. melanogaster), Caenorhabditis elegans (C. elegans), Bacillus subtilis (B. subtilis), and Escherichia coli (E. coli). Minimum Redundancy Maximum Relevance (mRMR) algorithm and increment feature selection strategy were then adopted to find out optimal feature subsets. Support vector machine (SVM) was used to distinguish between promoters and non-promoters. In the 10-fold cross-validation test, accuracies of 93.3%, 93.9%, 95.7%, 95.2%, and 93.1% were obtained for H. sapiens, D. melanogaster, C. elegans, B. subtilis, and E. coli, with the areas under receiver operating curves (AUCs) of 0.974, 0.975, 0.981, 0.988, and 0.976, respectively. Comparative results demonstrated that our method outperforms existing methods for identifying promoters. An online web server was established that can be freely accessed (


Journal Details

This article was published in the following journal.

Name: Molecular therapy. Nucleic acids
ISSN: 2162-2531
Pages: 337-346


DeepDyve research library

PubMed Articles [4511 Associated PubMed Articles listed on BioPortfolio]

Promoter analysis and prediction in the human genome using sequence-based deep learning models.

Computational identification of promoters is notoriously difficult as human genes often have unique promoter sequences that provide regulation of transcription and interaction with transcription initi...

Identification of Intrinsically Disordered Proteins and Regions by Length-Dependent Predictors Based on Conditional Random Fields.

Accurate identification of intrinsically disordered proteins/regions (IDPs/IDRs) is critical for predicting protein structure and function. Previous studies have shown that IDRs of different lengths h...

Predictive markers for MGMT promoter methylation in glioblastomas.

The promoter methylation status of the O-methylguanine-DNA methyltransferase (MGMT) gene has been described as the most important predictor of chemotherapeutic response and patients' survival in gliob...

Net charge of antibody complementarity-determining regions is a key predictor of specificity.

Specificity is one of the most important and complex properties that is central to both natural antibody function and therapeutic antibody efficacy. However, it has proven extremely challenging to def...

Impact of predictor measurement heterogeneity across settings on the performance of prediction models: A measurement error perspective.

It is widely acknowledged that the predictive performance of clinical prediction models should be studied in patients that were not part of the data in which the model was derived. Out-of-sample perfo...

Clinical Trials [1373 Associated Clinical Trials listed on BioPortfolio]

Computational Simulation of Patellar Instability

Computational simulation will be performed to represent motion of knees with a dislocating kneecap. Common surgical treatment methods will be simulated and anatomical parameters commonly a...

Cilengitide, Temozolomide, and Radiation Therapy in Treating Patients With Newly Diagnosed Glioblastoma and Methylated Gene Promoter Status

CENTRIC is a Phase III clinical trial assessing efficacy and safety of the investigational integrin inhibitor, cilengitide, in combination with standard treatment versus standard treatment...

Expression & Epigenetic Silencing of MicroRNA for Predicting Therapeutic Response and Prognosis of HPV-negative HNSCC

A two-part molecular epidemiological study will be conducted to comprehensively assess the association between miR expression and miR promoter methylation and the response to therapy and p...

Which Patients Benefit From Physical Activity on Prescription? A Predictor Analysis of Factors for Increased Physical Activity.

The aim of this study is to explore possible predicting factors associated with physical activity (PA) level change in a 6-month period of physical activity on prescription (PAP) treatment...

Cilengitide, Temozolomide, and Radiation Therapy in Treating Patients With Newly Diagnosed Glioblastoma and Unmethylated Gene Promoter Status

CORE is a Phase II clinical trial in newly diagnosed glioblastoma multiforme (GBM) in patients with an unmethylated promoter of the methylguanine-DNA methyltransferase (MGMT) gene in the t...

Medical and Biotech [MESH] Definitions

DNA sequences which are recognized (directly or indirectly) and bound by a DNA-dependent RNA polymerase during the initiation of transcription. Highly conserved sequences within the promoter include the Pribnow box in bacteria and the TATA BOX in eukaryotes.

Genes whose expression is easily detectable and therefore used to study promoter activity at many positions in a target genome. In recombinant DNA technology, these genes may be attached to a promoter region of interest.

Promoter-specific RNA polymerase II transcription factor that binds to the GC box, one of the upstream promoter elements, in mammalian cells. The binding of Sp1 is necessary for the initiation of transcription in the promoters of a variety of cellular and viral GENES.

Models connecting initiating events at the cellular and molecular level to population-wide impacts. Computational models may be at levels relating toxicology to adverse effects.


Quick Search

DeepDyve research library

Relevant Topic

Bioinformatics is the application of computer software and hardware to the management of biological data to create useful information. Computers are used to gather, store, analyze and integrate biological and genetic information which can then be applied...

Searches Linking to this Article