Track topics on Twitter Track topics that are important to you
Genomic variations in a reference collection are naturally represented as genome variation graphs. Such graphs encode common subsequences as vertices and the variations are captured using additional vertices and directed edges. The resulting graphs are directed graphs possibly with cycles. Existing algorithms for aligning sequences on such graphs make use of partial order alignment (POA) techniques that work on directed acyclic graphs (DAGs). To achieve this, acyclic extensions of the input graphs are first constructed through expensive loop unrolling steps (DAGification). Furthermore, such graph extensions could have considerable blowup in their size and in the worst case the blow-up factor is proportional to the input sequence length. We provide a novel alignment algorithm V-ALIGN that aligns the input sequence directly on the input graph while avoiding such expensive DAGification steps. V-ALIGN is based on a novel dynamic programming (DP) formulation that allows gapped alignment directly on the input graph. It supports affine and linear gaps. We also propose refinements to V-ALIGN for better performance in practice. With the proposed refinements, the time to fill the DP table has linear dependence on the sizes of the sequence, the graph, and its feedback vertex set. We conducted experiments to compare the proposed algorithm against the existing POA-based techniques. We also performed alignment experiments on the genome variation graphs constructed from the 1000 Genomes data. For aligning short sequences, standard approaches restrict the expensive gapped alignment to small filtered subgraphs having high similarity to the input sequence. In such cases, the performance of V-ALIGN for gapped alignment on the filtered subgraph depends on the subgraph sizes.
This article was published in the following journal.
Name: Journal of computational biology : a journal of computational molecular cell biology
The Graphical Fragment Assembly (GFA) formats are emerging standard formats for the representation of sequence graphs. While GFA 1 was primarely targeting assembly graphs, the newer GFA 2 format intro...
Reference genomes guide our interpretation of DNA sequence data. However, conventional linear references represent only one version of each locus, ignoring variation in the population. Poor representa...
Optical maps are high resolution restriction maps that give a unique numeric representation to a genome. Used in concert with sequence reads, they provide a useful tool for genome assembly and for dis...
Most evolutionary analyses are based on pre-estimated multiple sequence alignment. Wong et al. established the existence of an uncertainty induced by multiple sequence alignment when reconstructing ph...
Protein or DNA motifs are sequence regions which possess biological importance. These regions are often highly conserved among homologous sequences. The generation of multiple sequence alignments (MSA...
The aim of this study is to evaluate postoperative knee function after total knee arthroplasty performed according to the anatomical alignment and compare these results to those of a match...
This investigation is intended to provide clinical information about alignment using TruMatch™ and to compare the results to a conventional total knee replacement. TruMatch™ will be c...
To evaluate pharmacokinetic properties and drug interactions between D326 and D337 co-administered groups, the CKD-828 alone and the total co-administered groups.
As many as 20% of patients are unhappy with the results of total knee replacement (TKR). Various changes to surgical technique have tried to address this but have not led to a significant ...
When carrying out a knee replacement operation one of the goals is to correct any deformity of the leg (bowlegged or knock kneed). The ideal alignment is the mechanical axis, which is a li...
An isothermal in-vitro nucleotide amplification process. The process involves the concomitant action of a RNA-DIRECTED DNA POLYMERASE, a ribonuclease (RIBONUCLEASES), and DNA-DIRECTED RNA POLYMERASES to synthesize large quantities of sequence-specific RNA and DNA molecules.
The first nucleotide of a transcribed DNA sequence where RNA polymerase (DNA-DIRECTED RNA POLYMERASE) begins synthesizing the RNA transcript.
Information presented in graphic form, for example, graphs or diagrams.
The arrangement of two or more amino acid or base sequences from an organism or organisms in such a way as to align areas of the sequences sharing common properties. The degree of relatedness or homology between the sequences is predicted computationally or statistically based on weights assigned to the elements aligned between the sequences. This in turn can serve as a potential indicator of the genetic relatedness between the organisms.
Graphs representing sets of measurable, non-covalent physical contacts with specific PROTEINS in living organisms or in cells.
Bioinformatics is the application of computer software and hardware to the management of biological data to create useful information. Computers are used to gather, store, analyze and integrate biological and genetic information which can then be applied...