RADProc: A computationally efficient de novo locus assembler for population studies using RADseq data.

08:00 EDT 12th October 2018 | BioPortfolio

Summary of "RADProc: A computationally efficient de novo locus assembler for population studies using RADseq data."

Restriction-site associated DNA sequencing (RADseq) is a powerful tool for genotyping of individuals, but the identification of loci and assignment of sequence reads is a crucial and often challenging step. The optimal parameter settings for a given de novo RADseq assembly varies between datasets and can be difficult and computationally expensive to determine. Here we introduce RADProc, a software package that uses a graph data structure to represent all sequence reads and their similarity relationships. Storing sequence-comparison results in a graph eliminates unnecessary and redundant sequence-similarity calculations. De novo locus formation for a given parameter set can be performed on the pre-computed graph, making parameter sweeps far more efficient. RADProc also uses a clustering approach for faster nucleotide-distance calculation. The performance of RADProc compares favorably with that of the widely used Stacks software. The run-time comparisons between RADProc and Stacks for 32 different parameter settings using 20 green crab (Carcinus maenas) samples showed that RADProc took as little as 2 hours 40 minutes compared to 78 hours by Stacks, while 16 brown trout (Salmo trutta L.) samples were processed by RADProc and Stacks in 23 hours and 263 hours respectively. Comparisons of the de novo loci formed, and catalog built using both the methods demonstrate that the improvement in processing speeds achieved by RADProc does not affect much the actual loci formed and the results of downstream analyses based on those loci. This article is protected by copyright. All rights reserved.


Journal Details

This article was published in the following journal.

Name: Molecular ecology resources
ISSN: 1755-0998


DeepDyve research library

PubMed Articles [22190 Associated PubMed Articles listed on BioPortfolio]

Practical evaluation of 11 de novo assemblers in metagenome assembly.

Next Generation Sequencing (NGS) technologies are revolutionizing the field of biology and metagenomic-based research. Since the volume of metagenomic data is typically very large, De novo metagenomic...

novoCaller: A Bayesian network approach for de novo variant calling from pedigree and population sequence data.

De novo mutations (i.e., newly occurring mutations) are a predominant cause of sporadic dominant monogenic diseases and play a significant role in the genetics of complex disorders. De novo mutation s...

Grouper: Graph-based clustering and annotation for improved de novo transcriptome analysis.

De novo transcriptome analysis using RNA-seq offers a promising means to study gene expression in non-model organisms. Yet, the difficulty of transcriptome assembly means that the contigs provided by ...

GRASShopPER-An algorithm for de novo assembly based on GPU alignments.

Next generation sequencers produce billions of short DNA sequences in a massively parallel manner, which causes a great computational challenge in accurately reconstructing a genome sequence de novo u...

Risk of de novo aneurysm formation in patients with a prior diagnosis of ruptured or unruptured aneurysm: systematic review and meta-analysis.

OBJECTIVE De novo aneurysms are rare entities periodically discovered during follow-up imaging. Little is known regarding the frequency with which these lesions form or the time course. This systemati...

Clinical Trials [3409 Associated Clinical Trials listed on BioPortfolio]

LOC387715/HTRA1 Variants in Polypoidal Choroidal Vasculopathy in a Korean Population

This study is to investigate whether variants in the LOC387715 locus and the HtrA serine peptidase 1 (HTRA1) gene within the 10q26 locus are associated with polypoidal choroidal vasculopat...

Lipid Research Clinics Population Studies

To conduct epidemiologic surveys of the distribution, causes, and consequences of the dyslipoproteinemias. The Population Studies include the Prevalence Study, the Family Study, and the M...

Locus of Control and Spirituality in Palliative Care Patients

Primary Objectives: 1. To determine whether the degree of spirituality/religiosity as determined by the Duke University Religion Index and Functional Assessment of Chronic Illness ...

Evaluating Therapeutic Response to Novo-TTF

This study is to assess the utility of high resolution 3D echo planar magnetic resonance spectroscopy (3D EPSI) in monitoring Novo-TTF response in glioblastoma multiforme (GBM) patients.

CCFZ533X2201 - PoC Study in de Novo Renal Transplantation

The purpose of this study is to investigate the safety, tolerability, pharmacokinetics (PK) and potential for CFZ533 to replace calcineurin inhibitors (CNI), while providing a similar rate...

Medical and Biotech [MESH] Definitions

Studies in which a number of subjects are selected from all subjects in a defined population. Conclusions based on sample results may be attributed only to the population sampled.

Studies in which the presence or absence of disease or other health-related variables are determined in each member of the study population or in a representative sample at one particular time. This contrasts with LONGITUDINAL STUDIES which are followed over a period of time.

Ongoing scrutiny of a population (general population, study population, target population, etc.), generally using methods distinguished by their practicability, uniformity, and frequently their rapidity, rather than by complete accuracy.

The proportion of one particular in the total of all ALLELES for one genetic locus in a breeding POPULATION.

Studies in which subsets of a defined population are identified. These groups may or may not be exposed to factors hypothesized to influence the probability of the occurrence of a particular disease or other outcome. Cohorts are defined populations which, as a whole, are followed in an attempt to determine distinguishing subgroup characteristics.

Quick Search


DeepDyve research library

Relevant Topics

Bioinformatics is the application of computer software and hardware to the management of biological data to create useful information. Computers are used to gather, store, analyze and integrate biological and genetic information which can then be applied...

DNA sequencing
DNA sequencing is the process of determining the precise order of nucleotides within a DNA molecule. During DNA sequencing, the bases of a small fragment of DNA are sequentially identified from signals emitted as each fragment is re-synthesized from a ...

Searches Linking to this Article