Advertisement

Topics

Halvade-RNA: Parallel variant calling from transcriptomic data using MapReduce.

08:00 EDT 30th March 2017 | BioPortfolio

Summary of "Halvade-RNA: Parallel variant calling from transcriptomic data using MapReduce."

Given the current cost-effectiveness of next-generation sequencing, the amount of DNA-seq and RNA-seq data generated is ever increasing. One of the primary objectives of NGS experiments is calling genetic variants. While highly accurate, most variant calling pipelines are not optimized to run efficiently on large data sets. However, as variant calling in genomic data has become common practice, several methods have been proposed to reduce runtime for DNA-seq analysis through the use of parallel computing. Determining the effectively expressed variants from transcriptomics (RNA-seq) data has only recently become possible, and as such does not yet benefit from efficiently parallelized workflows. We introduce Halvade-RNA, a parallel, multi-node RNA-seq variant calling pipeline based on the GATK Best Practices recommendations. Halvade-RNA makes use of the MapReduce programming model to create and manage parallel data streams on which multiple instances of existing tools such as STAR and GATK operate concurrently. Whereas the single-threaded processing of a typical RNA-seq sample requires ∼28h, Halvade-RNA reduces this runtime to ∼2h using a small cluster with two 20-core machines. Even on a single, multi-core workstation, Halvade-RNA can significantly reduce runtime compared to using multi-threading, thus providing for a more cost-effective processing of RNA-seq data. Halvade-RNA is written in Java and uses the Hadoop MapReduce 2.0 API. It supports a wide range of distributions of Hadoop, including Cloudera and Amazon EMR.

Affiliation

Journal Details

This article was published in the following journal.

Name: PloS one
ISSN: 1932-6203
Pages: e0174575

Links

DeepDyve research library

PubMed Articles [12606 Associated PubMed Articles listed on BioPortfolio]

A framework for the estimation of the proportion of true discoveries in single nucleotide variant detection studies for human data.

Any single nucleotide variant detection study could benefit from a fast and cheap method of measuring the quality of variant call list. It is advantageous to be able to see how the call list quality i...

Comparison of INDEL Calling Tools with Simulation Data and Real Short-Read Data.

Insertions and deletions (INDELs) comprise a significant proportion of human genetic variation, and recent papers have revealed that many human diseases may be attributable to INDELs. With the develop...

appreci8: A Pipeline for Precise Variant Calling Integrating 8 Tools.

The application of next-generation sequencing in research and particularly in clinical routine requires valid variant calling results. However, evaluation of several commonly used tools has pointed ou...

novoCaller: A Bayesian network approach for de novo variant calling from pedigree and population sequence data.

De novo mutations (i.e., newly occurring mutations) are a predominant cause of sporadic dominant monogenic diseases and play a significant role in the genetics of complex disorders. De novo mutation s...

Nimbus: A design-driven analyses suite for amplicon based NGS data.

PCR-based DNA enrichment followed by massively parallel sequencing is a straightforward and cost effective method to sequence genes up to high depth. The full potential of amplicon based sequencing as...

Clinical Trials [3859 Associated Clinical Trials listed on BioPortfolio]

Skin Tape Harvesting for Transcriptomics Analysis

Transcriptomics is the study of how RNA is expressed under specific conditions. Transcriptomic analyses of lesional skin biopsies can be a useful way to track how a patient responds to a d...

Transcriptomic Profiling in Severely Injured Patients

Discovery of differences in the host response in patients with systemic inflammation and sepsis, and identification of novel, specific markers by using a longitudinal clinico-transcriptomi...

Estimate of the Activity and the Forecast of the Lupus Disease of the Adult by a Transcriptomic Score (STUDY LU-PUCE)

Lupus erythematosus systemic is an auto-immune disease the evaluation of the activity of which remains very difficult because of an heterogeneousness of the clinical and biological symptom...

Transcriptomic and Biochemical Changes During a Migraine Attack

Despite the fact that migraine is a common disorder, the pathogenesis is still not fully elucidated. Studying transcriptomic and biochemical changes during induced and spontaneous migraine...

Low FODMAPs Diet vs. Specific Dietary Advice in Patients With IBS Diarrheal Variant

A reduced content of FODMAPs (fermentable oligosaccharides, disaccharides, monosaccharides, and polyols) in the diet may be beneficial for patients with IBS diarrheal variant, but so far f...

Medical and Biotech [MESH] Definitions

Information application based on a variety of coding methods to minimize the amount of data to be stored, retrieved, or transmitted. Data compression can be applied to various forms of data, such as images and signals. It is used to reduce costs and increase efficiency in the maintenance of large volumes of data.

Various units or machines that operate in combination or in conjunction with a computer but are not physically part of it. Peripheral devices typically display computer data, store data from the computer and return the data to the computer on demand, prepare data for human use, or acquire data from a source and convert it to a form usable by a computer. (Computer Dictionary, 4th ed.)

The science and art of collecting, summarizing, and analyzing data that are subject to random variation. The term is also applied to the data themselves and to the summarization of the data.

Systematic gathering of data for a particular purpose from various sources, including questionnaires, interviews, observation, existing records, and electronic devices. The process is usually preliminary to statistical analysis of the data.

Devices capable of receiving data, retaining data for an indefinite or finite period of time, and supplying data upon demand.

Advertisement
Quick Search
Advertisement
Advertisement

 


DeepDyve research library

Relevant Topics

DNA sequencing
DNA sequencing is the process of determining the precise order of nucleotides within a DNA molecule. During DNA sequencing, the bases of a small fragment of DNA are sequentially identified from signals emitted as each fragment is re-synthesized from a ...

Bioinformatics
Bioinformatics is the application of computer software and hardware to the management of biological data to create useful information. Computers are used to gather, store, analyze and integrate biological and genetic information which can then be applied...


Searches Linking to this Article