Halvade-RNA: Parallel variant calling from transcriptomic data using MapReduce.

08:00 EDT 30th March 2017 | BioPortfolio

Summary of "Halvade-RNA: Parallel variant calling from transcriptomic data using MapReduce."

Given the current cost-effectiveness of next-generation sequencing, the amount of DNA-seq and RNA-seq data generated is ever increasing. One of the primary objectives of NGS experiments is calling genetic variants. While highly accurate, most variant calling pipelines are not optimized to run efficiently on large data sets. However, as variant calling in genomic data has become common practice, several methods have been proposed to reduce runtime for DNA-seq analysis through the use of parallel computing. Determining the effectively expressed variants from transcriptomics (RNA-seq) data has only recently become possible, and as such does not yet benefit from efficiently parallelized workflows. We introduce Halvade-RNA, a parallel, multi-node RNA-seq variant calling pipeline based on the GATK Best Practices recommendations. Halvade-RNA makes use of the MapReduce programming model to create and manage parallel data streams on which multiple instances of existing tools such as STAR and GATK operate concurrently. Whereas the single-threaded processing of a typical RNA-seq sample requires ∼28h, Halvade-RNA reduces this runtime to ∼2h using a small cluster with two 20-core machines. Even on a single, multi-core workstation, Halvade-RNA can significantly reduce runtime compared to using multi-threading, thus providing for a more cost-effective processing of RNA-seq data. Halvade-RNA is written in Java and uses the Hadoop MapReduce 2.0 API. It supports a wide range of distributions of Hadoop, including Cloudera and Amazon EMR.


Journal Details

This article was published in the following journal.

Name: PloS one
ISSN: 1932-6203
Pages: e0174575


DeepDyve research library

PubMed Articles [12591 Associated PubMed Articles listed on BioPortfolio]

Comprehensive benchmarking of SNV callers for highly admixed tumor data.

Precision medicine attempts to individualize cancer therapy by matching tumor-specific genetic changes with effective targeted therapies. A crucial first step in this process is the reliable identific...

Direct comparison of performance of single nucleotide variant calling in human genome with alignment-based and assembly-based approaches.

Complementary to reference-based variant detection, recent studies revealed that many novel variants could be detected with de novo assembled genomes. To evaluate the effect of reads coverage and the ...

DIMM-SC: a Dirichlet mixture model for clustering droplet-based single cell transcriptomic data.

Single cell transcriptome sequencing (scRNA-Seq) has become a revolutionary tool to study cellular and molecular processes at single cell resolution. Among existing technologies, the recently develope...

Canvas SPW: calling de novo copy number variants in pedigrees.

Whole genome sequencing is becoming a diagnostics of choice for the identification of rare inherited and de novo copy number variants in families with various pediatric and late-onset genetic diseases...

Optimized detection of insertions/deletions (INDELs) in whole-exome sequencing data.

Insertion and deletion (INDEL) mutations, the most common type of structural variance, are associated with several human diseases. The detection of INDELs through next-generation sequencing (NGS) is b...

Clinical Trials [3166 Associated Clinical Trials listed on BioPortfolio]

Skin Tape Harvesting for Transcriptomics Analysis

Transcriptomics is the study of how RNA is expressed under specific conditions. Transcriptomic analyses of lesional skin biopsies can be a useful way to track how a patient responds to a d...

Transcriptomic Profiling in Severely Injured Patients

Discovery of differences in the host response in patients with systemic inflammation and sepsis, and identification of novel, specific markers by using a longitudinal clinico-transcriptomi...

Estimate of the Activity and the Forecast of the Lupus Disease of the Adult by a Transcriptomic Score (STUDY LU-PUCE)

Lupus erythematosus systemic is an auto-immune disease the evaluation of the activity of which remains very difficult because of an heterogeneousness of the clinical and biological symptom...

Transcriptomic and Biochemical Changes During a Migraine Attack

Despite the fact that migraine is a common disorder, the pathogenesis is still not fully elucidated. Studying transcriptomic and biochemical changes during induced and spontaneous migraine...

Observation of Cough Variant Asthma Treated in Combination of Chanqin Granules.

This single-center, randomized, double-blind, placebo-controlled trial was undertaken at an outpatient clinic in Shuguang Hospital. Newly diagnosed cough variant asthma adult patients with...

Medical and Biotech [MESH] Definitions

Information application based on a variety of coding methods to minimize the amount of data to be stored, retrieved, or transmitted. Data compression can be applied to various forms of data, such as images and signals. It is used to reduce costs and increase efficiency in the maintenance of large volumes of data.

Various units or machines that operate in combination or in conjunction with a computer but are not physically part of it. Peripheral devices typically display computer data, store data from the computer and return the data to the computer on demand, prepare data for human use, or acquire data from a source and convert it to a form usable by a computer. (Computer Dictionary, 4th ed.)

The science and art of collecting, summarizing, and analyzing data that are subject to random variation. The term is also applied to the data themselves and to the summarization of the data.

Systematic gathering of data for a particular purpose from various sources, including questionnaires, interviews, observation, existing records, and electronic devices. The process is usually preliminary to statistical analysis of the data.

Devices capable of receiving data, retaining data for an indefinite or finite period of time, and supplying data upon demand.

Quick Search

DeepDyve research library

Relevant Topics

DNA sequencing
DNA sequencing is the process of determining the precise order of nucleotides within a DNA molecule. During DNA sequencing, the bases of a small fragment of DNA are sequentially identified from signals emitted as each fragment is re-synthesized from a ...

Bioinformatics is the application of computer software and hardware to the management of biological data to create useful information. Computers are used to gather, store, analyze and integrate biological and genetic information which can then be applied...

Searches Linking to this Article