Effects of Distance Measure Choice on K-Nearest Neighbor Classifier Performance: A Review.

08:00 EDT 14th August 2019 | BioPortfolio

Summary of "Effects of Distance Measure Choice on K-Nearest Neighbor Classifier Performance: A Review."

The K-nearest neighbor (KNN) classifier is one of the simplest and most common classifiers, yet its performance competes with the most complex classifiers in the literature. The core of this classifier depends mainly on measuring the distance or similarity between the tested examples and the training examples. This raises a major question about which distance measures to be used for the KNN classifier among a large number of distance and similarity measures available? This review attempts to answer this question through evaluating the performance (measured by accuracy, precision, and recall) of the KNN using a large number of distance measures, tested on a number of real-world data sets, with and without adding different levels of noise. The experimental results show that the performance of KNN classifier depends significantly on the distance used, and the results showed large gaps between the performances of different distances. We found that a recently proposed nonconvex distance performed the best when applied on most data sets comparing with the other tested distances. In addition, the performance of the KNN with this top performing distance degraded only ∼20% while the noise level reaches 90%, this is true for most of the distances used as well. This means that the KNN classifier using any of the top 10 distances tolerates noise to a certain degree. Moreover, the results show that some distances are less affected by the added noise comparing with other distances.


Journal Details

This article was published in the following journal.

Name: Big data
ISSN: 2167-647X


DeepDyve research library

PubMed Articles [24300 Associated PubMed Articles listed on BioPortfolio]

Thermodynamic characterization and nearest neighbor parameters for RNA duplexes under molecular crowding conditions.

It is essential to study RNA under molecular crowding conditions to better predict secondary structures of RNAs in vivo. No systematic study has been completed to determine the effects of molecular cr...

A Training Data Set Cleaning Method by Classification Ability Ranking for the k-Nearest Neighbor Classifier.

The k-nearest neighbor (KNN) rule is a successful technique in pattern classification due to its simplicity and effectiveness. As a supervised classifier, KNN classification performance usually suffer...

Random sequential addition simulations of animal aggregations provide null models of group structure.

Apparent structure in animal aggregations such as fish and Antarctic krill schools may result from the tight packing of these elongated animals. This geometrical structure may be difficult to differen...

Validation of the nearest-neighbor model for Watson-Crick self-complementary DNA duplexes in molecular crowding condition.

Recent advancement in nucleic acid techniques inside cells demands the knowledge of the stability of nucleic acid structures in molecular crowding. The nearest-neighbor model has been successfully use...

A topological approach to selecting models of biological experiments.

We use topological data analysis as a tool to analyze the fit of mathematical models to experimental data. This study is built on data obtained from motion tracking groups of aphids in [Nilsen et al.,...

Clinical Trials [8782 Associated Clinical Trials listed on BioPortfolio]

Molecular Classifier for the Fine Needle-based Assessment of Malignancy Risk in Thyroid Nodules

This study evaluates the usefulness of molecular classifier to aid the diagnosis of malignancy in the material obtained by fine-needle aspiration biopsy (FNAB) of thyroid nodule. All parti...

The Welcome Incoming Neighbor (WIN) Community Trial

Migration is common in rural Africa: in-migrants have higher HIV incidence and prevalence than community residents, but underutilize combined HIV prevention and care services, including vo...

An Innovative Approach for Uniblocker Intubation

To measure the distance between the upper edge of the thyroid cartilage to the upper edge of the sternum add the distance from the upper edge of the sternum to the carina was calculated ac...

Effects of Ketogenic Diet on Triathlon's Performance

The study aimed to verify the effects of 5 weeks of ketogenic diet (KD) on some performance index in long distance triathletes

Fetal Ano-genital Distance in 2D Ultrasound.

The purpose of this study is to determine a threshold value of fetal anogenital distance in 2D ultrasound to differentiate male fetuses from female fetuses, starting 18 weeks of gestation ...

Medical and Biotech [MESH] Definitions

A quantitative or qualitative measure of intellectual, scholarly, or scholastic accomplishment.

A performance test based on forced MOTOR ACTIVITY on a rotating rod, usually by a rodent. Parameters include the riding time (seconds) or endurance. Test is used to evaluate balance and coordination of the subjects, particular in experimental animal models for neurological disorders and drug effects.

A performance measure for rating the ability of a person to perform usual activities, evaluating a patient's progress after a therapeutic procedure, and determining a patient's suitability for therapy. It is used most commonly in the prognosis of cancer therapy, usually after chemotherapy and customarily administered before and after therapy. It was named for Dr. David A. Karnofsky, an American specialist in cancer chemotherapy.

Measure of how well someone performs given tasks at their place of work.

The course of learning of an individual or a group. It is a measure of performance plotted over time.

Quick Search

DeepDyve research library

Searches Linking to this Article