Track topics on Twitter Track topics that are important to you
One of the important approaches of handling data heterogeneity in multimodal data clustering is modeling each modality using a separate similarity graph. Information from the multiple graphs is integrated by combining them into a unified graph. A major challenge here is how to preserve cluster information while removing noise from individual graphs. In this regard, a novel algorithm, termed as CoALa, is proposed that integrates noise-free approximations of multiple similarity graphs. The proposed method first approximates a graph using the most informative eigenpairs of its Laplacian which contain cluster information. The approximate Laplacians are then integrated for the construction of a low-rank subspace that best preserves overall cluster information of multiple graphs. However, this approximate subspace differs from the full-rank subspace which integrates information from all the eigenpairs of each Laplacian. Matrix perturbation theory is used to theoretically evaluate how far approximate subspace deviates from the full-rank one for a given value of approximation rank. Finally, spectral clustering is performed on the approximate subspace to identify the clusters. Experimental results on several real-life cancer and benchmark data sets demonstrate that the proposed algorithm significantly and consistently outperforms state-of-the-art integrative clustering approaches.
This article was published in the following journal.
Name: IEEE transactions on pattern analysis and machine intelligence
This paper proposes graph Laplacian regularization for robust estimation of optical flow. First, we analyze the spectral properties of dense graph Laplacians and show that dense graphs achieve a bette...
Graph embedding aims to transfer a graph into vectors to facilitate subsequent graph-analytics tasks like link prediction and graph clustering. Most approaches on graph embedding focus on preserving t...
Spectral clustering plays a significant role in applications that rely on multi-view data due to its well-defined mathematical framework and excellent performance on arbitrarily-shaped clusters. Unfor...
Retrieving nearest neighbors across correlated data in multiple modalities, such as image-text pairs on Facebook and video-tag pairs on YouTube, has become a challenging task due to the huge amount of...
Robust Principal Component Analysis (RPCA) is a powerful tool in machine learning and data mining problems. However, in many real-world applications, RPCA is unable to well encode the intrinsic geomet...
The primary objective of this study is to describe the relative distances between patient clinical profiles (i.e. patient clustering) in multivariate space.
In this study, measurements of fetal and maternal head circumference will be collected. This data will be presented in table or graph form. The effect of maternal head circumference on fet...
The NFPTR was established in 1994 to find the causes of pancreatic cancer. In brief, the investigators are interested in both the genetic and non-genetic causes of pancreatic cancer. The i...
This crossover design experimental study aims to compare the physiological stability of premature newborns during and after a cluster of care compared to a period when they receive standar...
Due to its high prevalence and the substantial individual and socio-economic burden chronic pain is a huge challenge for patients, physicians and the society. Using neuroimaging structural...
Signal and data processing method that uses decomposition of wavelets to approximate, estimate, or compress signals with finite time and frequency domains. It represents a signal or data in terms of a fast decaying wavelet series from the original prototype wavelet, called the mother wavelet. This mathematical algorithm has been adopted widely in biomedical disciplines for data and signal processing in noise removal and audio/image compression (e.g., EEG and MRI).
Information application based on a variety of coding methods to minimize the amount of data to be stored, retrieved, or transmitted. Data compression can be applied to various forms of data, such as images and signals. It is used to reduce costs and increase efficiency in the maintenance of large volumes of data.
Various units or machines that operate in combination or in conjunction with a computer but are not physically part of it. Peripheral devices typically display computer data, store data from the computer and return the data to the computer on demand, prepare data for human use, or acquire data from a source and convert it to a form usable by a computer. (Computer Dictionary, 4th ed.)
The science and art of collecting, summarizing, and analyzing data that are subject to random variation. The term is also applied to the data themselves and to the summarization of the data.
Systematic gathering of data for a particular purpose from various sources, including questionnaires, interviews, observation, existing records, and electronic devices. The process is usually preliminary to statistical analysis of the data.
Cancer is not just one disease but many diseases. There are more than 100 different types of cancer. Most cancers are named for the organ or type of cell in which they start - for example, cancer that begins in the colon is called colon cancer; cancer th...
Standard antiretroviral therapy (ART) consists of the combination of at least three antiretroviral (ARV) drugs to maximally suppress the HIV virus and stop the progression of HIV disease. Huge reductions have been seen in rates of death and suffering whe...