A New Data Analysis Method Based on Feature Linear Combination.

08:00 EDT 6th April 2019 | BioPortfolio

Summary of "A New Data Analysis Method Based on Feature Linear Combination."

In biological data, feature relationships are complex and diverse, they could reflect physiological and pathological changes. Defining simple and efficient classification rules based on feature relationships is helpful for discriminating different conditions and studying disease mechanism. The popular data analysis method, k top scoring pairs (k-TSP), explores the feature relationship by focusing on the difference of the relative level of two features in different groups and classifies samples based on the exploration. To define more efficient classification rules, we propose a new data analysis method based on the linear combination of k > 0 top scoring pairs (LC-k-TSP). LC-k-TSP applies support vector machine (SVM) to define the best linear relationship of each feature pair, scores feature pairs by the discriminative abilities of the corresponding linear combinations and selects k disjoint top scoring pairs to construct an ensemble classifier. Experiments on twelve public datasets showed the superiority of LC-k-TSP over k-TSP which evaluates the relationship of every two features in the same way. The experiment also illustrated that LC-k-TSP performed similarly to SVM and random forest (RF) in accuracy rate. LC-k-TSP studies the own unique linear combination for each feature pair and defines simple classification rules, it is easy to explore the biomedical explanation. Finally, we applied LC-k-TSP to analyze the hepatocellular carcinoma (HCC) metabolomics data and define the simple classification rules for discrimination of different liver diseases. It obtained accuracy rates of 89.76% and 89.13% in distinguishing between small HCC and hepatic cirrhosis (CIR) groups as well as between HCC and CIR groups, superior to 87.99 % and 80.35% by k-TSP. Hence, defining classification rules based on feature relationships is an effective way to analyze biological data. LC-k-TSP which checks different feature pairs by their corresponding unique best linear relationship has the superiority over k-TSP which checks each pair by the same linear relationship. Availability and implementation:


Journal Details

This article was published in the following journal.

Name: Journal of biomedical informatics
ISSN: 1532-0480
Pages: 103173


DeepDyve research library

PubMed Articles [47530 Associated PubMed Articles listed on BioPortfolio]

Learning a discriminant graph-based embedding with feature selection for image categorization.

Graph-based embedding methods are very useful for reducing the dimension of high-dimensional data and for extracting their relevant features. In this paper, we introduce a novel nonlinear method calle...

Performance comparison of linear and non-linear feature selection methods for the analysis of large survey datasets.

Large survey databases for aging-related analysis are often examined to discover key factors that affect a dependent variable of interest. Typically, this analysis is performed with methods assuming l...

Improving the Accuracy of Feature Selection in Big Data Mining Using Accelerated Flower Pollination (AFP) Algorithm.

In recent times, the main problem associated with big data analytics is its high dimensional data over the search space. Such data gathers continuously in search space making traditional algorithms in...

Bayesian generalized biclustering analysis via adaptive structured shrinkage.

Biclustering techniques can identify local patterns of a data matrix by clustering feature space and sample space at the same time. Various biclustering methods have been proposed and successfully app...

Predicting microsleep states using EEG inter-channel relationships.

A microsleep is a brief and involuntary sleep-related loss of consciousness of up to 15 s. We investigated the performances of seven pairwise inter-channel relationships - covariance, Pearson's correl...

Clinical Trials [15044 Associated Clinical Trials listed on BioPortfolio]

Diaphragmatic Motion Using Linear Ultrasound

This study evaluates the movement of the diaphragm (which is the main muscle used for breathing). It will compare two ultrasound modalities: linear ultrasound versus curvilinear ultrasound...

Retrospective Study of the Linear™ Hip

The purpose of this study is to evaluate the use and efficacy of the Encore Linear™ Hip System in a group of 200 patients for whom data has already been collected.

CPAP Device In-lab Assessment NZ

The purpose of this trial is to assess device performance against participants in an overnight study to ensure the product meets user and clinical requirements

Quest Sound Recover (SR2) vs. Venture SR2

Goal of this study is to determine the benefit of an improved feature on a new hearing aid platform. To investigate the improvements of this feature is compared on a new and older hearing...

Free Text Prediction Algorithm for Appendicitis

Computer-aided diagnostic software has been used to assist physicians in various ways. Text-based prediction algorithms have been trained on past medical records through data mining and fe...

Medical and Biotech [MESH] Definitions

A method of chemical analysis based on the detection of characteristic radionuclides following a nuclear bombardment. It is also known as radioactivity analysis. (McGraw-Hill Dictionary of Scientific and Technical Terms, 4th ed)

The statistical manipulation of hierarchically and non-hierarchically nested data. It includes clustered data, such as a sample of subjects within a group of schools. Prevalent in the social, behavioral sciences, and biomedical sciences, both linear and nonlinear regression models are applied.

Statistical formulations or analyses which, when applied to data and found to fit the data, are then used to verify the assumptions and parameters used in the analysis. Examples of statistical models are the linear model, binomial model, polynomial model, two-parameter model, etc.

Signal and data processing method that uses decomposition of wavelets to approximate, estimate, or compress signals with finite time and frequency domains. It represents a signal or data in terms of a fast decaying wavelet series from the original prototype wavelet, called the mother wavelet. This mathematical algorithm has been adopted widely in biomedical disciplines for data and signal processing in noise removal and audio/image compression (e.g., EEG and MRI).

Information application based on a variety of coding methods to minimize the amount of data to be stored, retrieved, or transmitted. Data compression can be applied to various forms of data, such as images and signals. It is used to reduce costs and increase efficiency in the maintenance of large volumes of data.

Quick Search


DeepDyve research library

Relevant Topics

Biological Therapy
Biological therapy involves the use of living organisms, substances derived from living organisms, or laboratory-produced versions of such substances to treat disease. Some biological therapies for cancer use vaccines or bacteria to stimulate the body&rs...

Hepatology is the study of liver, gallbladder, biliary tree, and pancreas, and diseases associated with them. This includes viral hepatitis, alcohol damage, cirrhosis and cancer. As modern lifestyles change, with alcoholism and cancer becoming more promi...

Searches Linking to this Article