Track topics on Twitter Track topics that are important to you
Tandem repeat (TR) expansions have been implicated in dozens of genetic diseases, including Huntington's Disease, Fragile X Syndrome, and hereditary ataxias. Furthermore, TRs have recently been implicated in a range of complex traits, including gene expression and cancer risk. While the human genome harbors hundreds of thousands of TRs, analysis of TR expansions has been mainly limited to known pathogenic loci. A major challenge is that expanded repeats are beyond the read length of most next-generation sequencing (NGS) datasets and are not profiled by existing genome-wide tools. We present GangSTR, a novel algorithm for genome-wide genotyping of both short and expanded TRs. GangSTR extracts information from paired-end reads into a unified model to estimate maximum likelihood TR lengths. We validate GangSTR on real and simulated data and show that GangSTR outperforms alternative methods in both accuracy and speed. We apply GangSTR to a deeply sequenced trio to profile the landscape of TR expansions in a healthy family and validate novel expansions using orthogonal technologies. Our analysis reveals that healthy individuals harbor dozens of long TR alleles not captured by current genome-wide methods. GangSTR will likely enable discovery of novel disease-associated variants not currently accessible from NGS.
This article was published in the following journal.
Name: Nucleic acids research
The GGGGCC repeat expansion in the C9orf72 gene was recently identified as a major cause of amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) in several European populations. The o...
CTG expansions in gene, causing myotonic dystrophy type 1 (DM1), are characterized by pronounced somatic instability. A large proportion of variability of somatic instability is explained by expansio...
We report an inborn error of metabolism caused by an expansion of a GCA-repeat tract in the 5' untranslated region of the gene encoding glutaminase () that was identified through detailed clinical and...
At least 40 human diseases are associated with repeat expansions; yet, the mutational origin and instability mechanisms remain unknown for most of them. Previously, genetic epidemiology and predisposi...
The emergence of neuronal diversity during development of the nervous system relies on dynamic changes in the epigenetic landscape of neural stem cells and their progeny. Targeted DamID (TaDa) is prov...
Previous studies have indicated that abnormal DNA methylation frequently occurs in the mucosa in Crohn's disease. Comprehensive DNA methylation profiling of the inflamed and non-inflamed i...
The aim of this study was to elucidate genetic susceptibility of patients with nontuberculous mycobacterial lung disease using genome-wide association study.
The objective of this study is to utilize trophoblast cells accumulating in the endocervical canal at the beginning of pregnancy for non-invasive prenatal testing. If we are able to valida...
The aim of this study is to establish large tissue sections for 10 kinds of tumors. in order to observe the tumor landscape on microscope. The tumors including esophageal carcinoma,gastric...
The next generation of personalized medical treatment according to the type of personal genetic information are evolving rapidly. The genome analysis needs systematic infra and database ba...
Tandem arrays of moderately repetitive (5-50 repeats) short (10-60 bases) DNA sequences found dispersed throughout the genome and clustered near telomeres. Their degree of repetition is two to several hundred at each locus. Loci number in the thousands but each locus shows a distinctive repeat unit. Minisatellite repeats are often called variable number of tandem repeats.
Copies of DNA sequences which lie adjacent to each other in the same orientation (direct tandem repeats) or in the opposite direction to each other (INVERTED TANDEM REPEATS).
Sequences of DNA or RNA that occur in multiple copies. There are several types: INTERSPERSED REPETITIVE SEQUENCES are copies of transposable elements (DNA TRANSPOSABLE ELEMENTS or RETROELEMENTS) dispersed throughout the genome. TERMINAL REPEAT SEQUENCES flank both ends of another sequence, for example, the long terminal repeats (LTRs) on RETROVIRUSES. Variations may be direct repeats, those occurring in the same direction, or inverted repeats, those opposite to each other in direction. TANDEM REPEAT SEQUENCES are copies which lie adjacent to each other, direct or inverted (INVERTED REPEAT SEQUENCES).
Membrane glycosylphosphatidylinositol-anchored glycoproteins that may aggregate into rod-like structures. The prion protein (PRNP) gene is characterized by five TANDEM REPEAT SEQUENCES that encode a highly unstable protein region of five octapeptide repeats. Mutations in the repeat region and elsewhere in this gene are associated with CREUTZFELDT-JAKOB DISEASE; FATAL FAMILIAL INSOMNIA; GERSTMANN-STRAUSSLER DISEASE; Huntington disease-like 1, and KURU.
Copies of nucleic acid sequence that are arranged in opposing orientation. They may lie adjacent to each other (tandem) or be separated by some sequence that is not part of the repeat (hyphenated). They may be true palindromic repeats, i.e. read the same backwards as forward, or complementary which reads as the base complement in the opposite orientation. Complementary inverted repeats have the potential to form hairpin loop or stem-loop structures which results in cruciform structures (such as CRUCIFORM DNA) when the complementary inverted repeats occur in double stranded regions.
Bioinformatics is the application of computer software and hardware to the management of biological data to create useful information. Computers are used to gather, store, analyze and integrate biological and genetic information which can then be applied...
Huntington's disease is a hereditary disease caused by a defect in a single gene on Chromosome 4 that is inherited in an autosomal dominant fashion. The defect causes a part of DNA, called a CAG repeat, to occur many more times than it is supposed to...
The process of gene expression is used by eukaryotes, prokaryotes, and viruses to generate the macromolecular machinery for life. Steps in the gene expression process may be modulated, including the transcription, RNA splicing, translation, and post-tran...