Low-dimensional representation of genomic sequences.

08:00 EDT 30th March 2019 | BioPortfolio

Summary of "Low-dimensional representation of genomic sequences."

Numerous data analysis and data mining techniques require that data be embedded in a Euclidean space. When faced with symbolic datasets, particularly biological sequence data produced by high-throughput sequencing assays, conventional embedding approaches like binary and k-mer count vectors may be too high dimensional or coarse-grained to learn from the data effectively. Other representation techniques such as Multidimensional Scaling (MDS) and Node2Vec may be inadequate for large datasets as they require recomputing the full embedding from scratch when faced with new, unclassified data. To overcome these issues we amend the graph-theoretic notion of "metric dimension" to that of "multilateration." Much like trilateration can be used to represent points in the Euclidean plane by their distances to three non-colinear points, multilateration allows us to represent any node in a graph by its distances to a subset of nodes. Unfortunately, the problem of determining a minimal subset and hence the lowest dimensional embedding is NP-complete for general graphs. However, by specializing to Hamming graphs, which are particularly well suited to representing biological sequences, we can readily generate low-dimensional embeddings to map sequences of arbitrary length to a real space. As proof-of-concept, we use MDS, Node2Vec, and multilateration-based embeddings to classify DNA 20-mers centered at intron-exon boundaries. Although these different techniques perform comparably, MDS and Node2Vec potentially suffer from scalability issues with increasing sequence length whereas multilateration provides an efficient means of mapping long genomic sequences.


Journal Details

This article was published in the following journal.

Name: Journal of mathematical biology
ISSN: 1432-1416


DeepDyve research library

PubMed Articles [7939 Associated PubMed Articles listed on BioPortfolio]

Adding Security and Privacy to Genomic Information Representation.

Provision of security and privacy to genomic data is a key issue in current genomic information representation. Existing formats do not give a solution to these issues (or they provide a partial one),...

FisOmics: A portal of fish genomic resources.

An online portal, accessible at URL:, was developed that features different genomic databases and tools. The portal, named as FisOmics, acts as a platform for sharin...

Visual representation of DNA sequences for exon detection using non-parametric spectral estimation techniques.

This paper presents a new approach for modeling of DNA sequences for the purpose of exon detection. The proposed model adopts the sum-of-sinusoids concept for the representation of DNA sequences. The ...

Diagnosis of Knee Meniscal Injuries by Using Three-dimensional MRI: A Systematic Review and Meta-Analysis of Diagnostic Performance.

Purpose To investigate the diagnostic performance of three-dimensional (3D) MRI for depicting meniscal injuries of the knee by using surgery as the standard of reference. Materials and Methods A liter...

Identifying anticancer peptides by using a generalized chaos game representation.

We generalize chaos game representation (CGR) to higher dimensional spaces while maintaining its bijection, keeping such method sufficiently representative and mathematically rigorous compare to previ...

Clinical Trials [1287 Associated Clinical Trials listed on BioPortfolio]

Rady Children's Institute Genomic Biorepository

Rady Children's Institute for Genomic Medicine (RCI) will collect biological samples (such as blood), derived genomic sequences (from DNA and RNA), and clinical features in a Biorepository...

Predictors of Intrauterine Growth Restriction

The main objectives of modern antenatal care programs are to identify high risk pregnancies then to predict any possibility of adverse pregnancy outcome as early as possible. The earliest...

The Neural Representation of Self in Depression Patients

To be aware of oneself as a unique entity in the world occurs early in human development and is the prerequisite of normal social functioning. The disturbance of self representation charac...

3-dimensional Versus 2-dimensional Laparoscopy of Ovarian Cyst

Lack of depth perception and spatial orientation are drawbacks of laparoscopic surgery. The advent of the three-dimensional (3D) camera system enables surgeons to regain binocular vision. ...

Exploring Genomic, Proteomic and Dosimetric Determinants of Late Toxicity After Three Dimensional Conformal Radiotherapy (RT) for Prostate Cancer

Prostate cancer is the most common malignancy in males, and radiotherapy is a commonly chosen treatment option for patients with localized disease. Technical innovations such as three-dime...

Medical and Biotech [MESH] Definitions

A large collection of DNA fragments cloned (CLONING, MOLECULAR) from a given organism, tissue, organ, or cell type. It may contain complete genomic sequences (GENOMIC LIBRARY) or complementary DNA sequences, the latter being formed from messenger RNA and lacking intron sequences.

The systematic study of annotated genomic information to global protein expression in order to determine the relationship between genomic sequences and both expressed proteins and predicted protein sequences.

A method for analyzing and mapping differences in the copy number of specific genes or other large sequences between two sets of chromosomal DNA. It is used to look for large sequence changes such as deletions, duplications, or amplifications within the genomic DNA of an individual (with a tumor for example) or family members or population or between species.

A form of GENE LIBRARY containing the complete DNA sequences present in the genome of a given organism. It contrasts with a cDNA library which contains only sequences utilized in protein coding (lacking introns).

Three-dimensional representation to show anatomic structures. Models may be used in place of intact animals or organisms for teaching, practice, and study.

Quick Search


DeepDyve research library

Relevant Topics

Biological Therapy
Biological therapy involves the use of living organisms, substances derived from living organisms, or laboratory-produced versions of such substances to treat disease. Some biological therapies for cancer use vaccines or bacteria to stimulate the body&rs...

DNA sequencing
DNA sequencing is the process of determining the precise order of nucleotides within a DNA molecule. During DNA sequencing, the bases of a small fragment of DNA are sequentially identified from signals emitted as each fragment is re-synthesized from a ...

Searches Linking to this Article