Track topics on Twitter Track topics that are important to you
Advancement of protein sequencing technologies has led to the production of a huge volume of data that needs to be stored and transmitted. This challenge can be tackled by compression. In this paper, we propose AC, a state-of-the-art method for lossless compression of amino acid sequences. The proposed method works based on the cooperation between finite-context models and substitutional tolerant Markov models. Compared to several general-purpose and specific-purpose protein compressors, AC provides the best bit-rates. This method can also compress the sequences nine times faster than its competitor, paq8l. In addition, employing AC, we analyze the compressibility of a large number of sequences from different domains. The results show that viruses are the most difficult sequences to be compressed. Archaea and bacteria are the second most difficult ones, and eukaryota are the easiest sequences to be compressed.
This article was published in the following journal.
Name: Interdisciplinary sciences, computational life sciences
Identification of regulatory elements is essential for understanding the mechanism behind regulating gene expression. These regulatory elements-located in or near gene-bind to proteins called transcri...
Hepatitis B virus (HBV) is an important human pathogen belonging to the Hepadnaviridae family, Orthohepadnavirus genus. Over 240 million people are infected with HBV worldwide. The reverse transcripti...
High-throughput sequencing of large immune repertoires has enabled the development of methods to predict the probability of generation by V(D)J recombination of T- and B-cell receptors of any specific...
Plasmodium falciparum histidine-rich proteins 2 (PfHRP2) based RDTs are advocated in falciparum malaria-endemic regions, particularly when quality microscopy is not available. However, diversity and a...
It is imperative to understand how chemical preservation alters tissue isotopic compositions before using historical samples in ecological studies. Specifically, although compound-specific isotope ana...
This study of the tolerance and acceptability of an amino acid based feed will assess gastrointestinal (GI) tolerance, product intake and acceptability in relation to taste, smell, texture...
Protein requirements in individuals who participate in endurance-based exercise training have been suggested to be greater than the current recommended dietary allowance (RDA). The biolog...
The aim is to assess the impact of physical inactivity on muscle amino acid balance. In addition, we will evaluate how the diet and/or a pharmacological intervention designed to manipulate...
In a recent randomized, double-blind, cross-over clinical trial, serum growth hormone (hGH) increased 682% above baseline 120 minutes after oral administration of an amino acid-based dieta...
This is a safety and tolerability study investigating the effect of an amino acid formulation in healthy volunteers during and after limb immobilization.
A theoretical representative nucleotide or amino acid sequence in which each nucleotide or amino acid is the one which occurs most frequently at that site in the different sequences which occur in nature. The phrase also refers to an actual sequence which approximates the theoretical consensus. A known CONSERVED SEQUENCE set is represented by a consensus sequence. Commonly observed supersecondary protein structures (AMINO ACID MOTIFS) are often formed by conserved sequences.
The arrangement of two or more amino acid or base sequences from an organism or organisms in such a way as to align areas of the sequences sharing common properties. The degree of relatedness or homology between the sequences is predicted computationally or statistically based on weights assigned to the elements aligned between the sequences. This in turn can serve as a potential indicator of the genetic relatedness between the organisms.
A broad-spectrum excitatory amino acid antagonist used as a research tool.
A sequence of amino acids in a polypeptide or of nucleotides in DNA or RNA that is similar across multiple species. A known set of conserved sequences is represented by a CONSENSUS SEQUENCE. AMINO ACID MOTIFS are often composed of conserved sequences.
Specific amino acid sequences present in the primary amino acid sequence of proteins which mediate their export from the CELL NUCLEUS. They are rich in hydrophobic residues, such as LEUCINE and ISOLEUCINE.
Within medicine, nutrition (the study of food and the effect of its components on the body) has many different roles. Appropriate nutrition can help prevent certain diseases, or treat others. In critically ill patients, artificial feeding by tubes need t...
DNA sequencing is the process of determining the precise order of nucleotides within a DNA molecule. During DNA sequencing, the bases of a small fragment of DNA are sequentially identified from signals emitted as each fragment is re-synthesized from a ...
Standard antiretroviral therapy (ART) consists of the combination of at least three antiretroviral (ARV) drugs to maximally suppress the HIV virus and stop the progression of HIV disease. Huge reductions have been seen in rates of death and suffering whe...