Enrichment-based proteogenomics identifies microproteins, missing proteins, and novel smORFs in Saccharomyces cerevisiae.

08:00 EDT 13th June 2018 | BioPortfolio

Summary of "Enrichment-based proteogenomics identifies microproteins, missing proteins, and novel smORFs in Saccharomyces cerevisiae."

Microproteins are peptides composed of 100 amino acids (AA) or less, encoded by small open reading frames (smORFs). It has been demonstrated that microproteins participate in and regulate a wide range of functions in cells. However, the annotation and identification of microproteins is challenging in part owing to their low molecular weight, low abundancy, and hydrophobicity. These factors have led to the un-annotation of smORFs in genome processing and have made their identification at the protein level difficult. Large-scale enrichment of microproteins in proteogenomics has made it possible to efficiently identify microproteins and discover unannotated smORFs in Saccharomyces cerevisiae. Here, we integrated four microprotein-specific enrichment strategies to enhance coverage. We identified 117 microproteins, verified 31 missing proteins (MPs), and discovered 3 novel smORFs. In total, 31 proteins were confirmed as MPs by spectrum quality checking. Three novel smORFs (YKL104W-A, YHR052C-B, and YHR054C-B) were reserved after spectrum quality checking, peptide synthesizing, homologue matching, etc. This study not only demonstrates that there are potential smORF candidates to be annotated in an extensively studied organism, but also presents an efficient strategy for the discovery of small MPs. All MS datasets have been deposited to the ProteomeXchange with identifier PXD008586 (Username:; Password: UNEbNk3j).


Journal Details

This article was published in the following journal.

Name: Journal of proteome research
ISSN: 1535-3907


DeepDyve research library

PubMed Articles [20184 Associated PubMed Articles listed on BioPortfolio]

The influence of transcript assembly on the proteogenomics discovery of microproteins.

Proteogenomics methods have identified many non-annotated protein-coding genes in the human genome. Many of the newly discovered protein-coding genes encode peptides and small proteins, referred to co...

WEADE: A workflow for enrichment analysis and data exploration.

Data analysis based on enrichment of Gene Ontology terms has become an important step in exploring large gene or protein expression datasets and several stand-alone or web tools exist for that purpose...

Digging for Missing Proteins Using Low-Molecular-Weight Protein Enrichment and a "Mirror Protease" Strategy.

In 2012, the Chromosome-Centric Human Proteome Project (C-HPP) launched an investigation for missing proteins (MPs) to complete the human proteome project (HPP). The majority of the MPs were distribut...

Considerations of multiple imputation approaches for handling missing data in clinical trials.

Missing data exist in all clinical trials and missing data issue is a very serious issue in terms of the interpretability of the trial results. There is no universally applicable solution for all miss...

Potentially missing data was considerably more frequent than definitely missing data in randomized controlled trials: A methodological survey.

Missing data for the outcomes of participants in randomized controlled trials (RCTs) are a key element of risk of bias assessment. However, it is not always clear from RCT reports whether some categor...

Clinical Trials [3706 Associated Clinical Trials listed on BioPortfolio]

Comparison of the Antihypertensive Efficacy of Valsartan and Enalapril After Missing One Dose

This study was designed in order to evaluate the blood pressure lowering effect of valsartan compared to enalapril over 24 hours after skipping one daily dose. Both drugs act on the renin...

A Screening and Recruitment Study in Adults Expressing Interest in the Emory Microbiota Enrichment Program

The goal of this study is to rapidly identify subjects who are eligible for the Microbiota Enrichment Program (MEP) at Emory in Atlanta, Georgia. This general screening protocol will be us...

Development of a Screening Strategy for Community-Based Adverse Drug Related Events in the Emergency Department

Adverse Drug Related Events (ADREs) are a leading cause of Emergency Department (ED) visits in Canada. However emergency physicians recognize only half of all ADREs in patients presenting ...

Acceptability of Products and Eating Pleasure in Elderly People Living at Home or in Establishment Hosting For the Dependant Elderly (EHPAD) (Old-people's Home)

In independent elderly people, the aim is to test recipes for different types of food from different countries (starter, main course with culinary aids, carrot purees, desserts and smoothi...

A Study of the Kinetics of a 13C-Cholesterol Infusate in Healthy Male Subjects (0000-108)(COMPLETED)

This is a 2-part pilot study in healthy male subjects to evaluate plasma enrichment kinetics of [13C3,4]-cholesterol (Part I) and to assess the test-retest reproducibility (Part II) of Rev...

Medical and Biotech [MESH] Definitions

The systematic study of annotated genomic information to global protein expression in order to determine the relationship between genomic sequences and both expressed proteins and predicted protein sequences.

Work consisting of the designation of an article or book as retracted in whole or in part by an author or authors or an authorized representative. It identifies a citation previously published and now retracted through a formal issuance from the author, publisher, or other authorized agent, and is distinguished from RETRACTION OF PUBLICATION, which identifies the citation retracting the original published item.

Adaptive antiviral defense mechanisms, in archaea and bacteria, based on DNA repeat arrays called CLUSTERED REGULARLY INTERSPACED SHORT PALINDROMIC REPEATS (CRISPR elements) that function in conjunction with CRISPR-ASSOCIATED PROTEINS (Cas proteins). Several types have been distinguished, including Type I, Type II, and Type III, based on signature motifs of CRISPR-ASSOCIATED PROTEINS.

Symbols or text that identifies a book as the work of a specific printer.

A conserved AMINO ACID SEQUENCE located in the intracellular domains of a family of transmembrane proteins that negatively regulate the signal transduction processes emanating from transmembrane proteins containing IMMUNORECEPTOR TYROSINE-BASED ACTIVATION MOTIFS. The CONSENSUS SEQUENCE of this motif is I(or V)LXYXXL(or V) (where X denotes any amino acid). Also known as ITIM motifs.

Quick Search


DeepDyve research library

Relevant Topic

Bioinformatics is the application of computer software and hardware to the management of biological data to create useful information. Computers are used to gather, store, analyze and integrate biological and genetic information which can then be applied...

Searches Linking to this Article