Track topics on Twitter Track topics that are important to you
Restriction-site associated DNA sequencing (RADseq) is a powerful tool for genotyping of individuals, but the identification of loci and assignment of sequence reads is a crucial and often challenging step. The optimal parameter settings for a given de novo RADseq assembly varies between datasets and can be difficult and computationally expensive to determine. Here we introduce RADProc, a software package that uses a graph data structure to represent all sequence reads and their similarity relationships. Storing sequence-comparison results in a graph eliminates unnecessary and redundant sequence-similarity calculations. De novo locus formation for a given parameter set can be performed on the pre-computed graph, making parameter sweeps far more efficient. RADProc also uses a clustering approach for faster nucleotide-distance calculation. The performance of RADProc compares favorably with that of the widely used Stacks software. The run-time comparisons between RADProc and Stacks for 32 different parameter settings using 20 green crab (Carcinus maenas) samples showed that RADProc took as little as 2 hours 40 minutes compared to 78 hours by Stacks, while 16 brown trout (Salmo trutta L.) samples were processed by RADProc and Stacks in 23 hours and 263 hours respectively. Comparisons of the de novo loci formed, and catalog built using both the methods demonstrate that the improvement in processing speeds achieved by RADProc does not affect much the actual loci formed and the results of downstream analyses based on those loci. This article is protected by copyright. All rights reserved.
This article was published in the following journal.
Name: Molecular ecology resources
Next Generation Sequencing (NGS) technologies are revolutionizing the field of biology and metagenomic-based research. Since the volume of metagenomic data is typically very large, De novo metagenomic...
De novo mutations (i.e., newly occurring mutations) are a predominant cause of sporadic dominant monogenic diseases and play a significant role in the genetics of complex disorders. De novo mutation s...
De novo transcriptome analysis using RNA-seq offers a promising means to study gene expression in non-model organisms. Yet, the difficulty of transcriptome assembly means that the contigs provided by ...
Next generation sequencers produce billions of short DNA sequences in a massively parallel manner, which causes a great computational challenge in accurately reconstructing a genome sequence de novo u...
The advent of locus-specific protein recruitment technologies has enabled a new class of studies in chromatin biology. Epigenome editors enable biochemical modifications of chromatin at almost any spe...
This study is to investigate whether variants in the LOC387715 locus and the HtrA serine peptidase 1 (HTRA1) gene within the 10q26 locus are associated with polypoidal choroidal vasculopat...
To conduct epidemiologic surveys of the distribution, causes, and consequences of the dyslipoproteinemias. The Population Studies include the Prevalence Study, the Family Study, and the M...
Primary Objectives: 1. To determine whether the degree of spirituality/religiosity as determined by the Duke University Religion Index and Functional Assessment of Chronic Illness ...
This study is to assess the utility of high resolution 3D echo planar magnetic resonance spectroscopy (3D EPSI) in monitoring Novo-TTF response in glioblastoma multiforme (GBM) patients.
The purpose of this study is to investigate the safety, tolerability, pharmacokinetics (PK) and potential for CFZ533 to replace calcineurin inhibitors (CNI), while providing a similar rate...
Studies in which a number of subjects are selected from all subjects in a defined population. Conclusions based on sample results may be attributed only to the population sampled.
Studies in which the presence or absence of disease or other health-related variables are determined in each member of the study population or in a representative sample at one particular time. This contrasts with LONGITUDINAL STUDIES which are followed over a period of time.
Ongoing scrutiny of a population (general population, study population, target population, etc.), generally using methods distinguished by their practicability, uniformity, and frequently their rapidity, rather than by complete accuracy.
The proportion of one particular in the total of all ALLELES for one genetic locus in a breeding POPULATION.
Studies in which subsets of a defined population are identified. These groups may or may not be exposed to factors hypothesized to influence the probability of the occurrence of a particular disease or other outcome. Cohorts are defined populations which, as a whole, are followed in an attempt to determine distinguishing subgroup characteristics.
Bioinformatics is the application of computer software and hardware to the management of biological data to create useful information. Computers are used to gather, store, analyze and integrate biological and genetic information which can then be applied...
DNA sequencing is the process of determining the precise order of nucleotides within a DNA molecule. During DNA sequencing, the bases of a small fragment of DNA are sequentially identified from signals emitted as each fragment is re-synthesized from a ...