Robust Quantization for General Similarity Search.

07:00 EST 1st February 2018 | BioPortfolio

Summary of "Robust Quantization for General Similarity Search."

The recent years have witnessed the emerging of vector quantization (VQ) techniques for efficient similarity search. VQ partitions the feature space into a set of codewords and encodes data points as integer indices using the codewords. Then the distance between data points can be efficiently approximated by simple memory lookup operations. By the compact quantization, the storage cost, and searching complexity are significantly reduced, thereby facilitating efficient large-scale similarity search. However, the performance of several celebrated VQ approaches degrades significantly when dealing with noisy data. In addition, it can barely facilitate a wide range of applications as the distortion measurement only limits to ℓ norm. To address the shortcomings of the squared Euclidean (ℓ norm) loss function employed by the VQ approaches, in this paper, we propose a novel robust and general VQ framework, named RGVQ, to enhance both robustness and generalization of VQ approaches. Specifically, a ℓ-norm loss function is proposed to conduct the ℓ-norm similarity search, rather than the ℓ norm search, and the q-th order loss is used to enhance the robustness. Despite the fact that changing the loss function to ℓ norm makes VQ approaches more robust and generic, it brings us a challenge that a non-smooth and non-convex orthogonality constrained ℓ-norm function has to be minimized. To solve this problem, we propose a novel and efficient optimization scheme and specify it to VQ approaches and theoretically prove its convergence. Extensive experiments on benchmark data sets demonstrate that the proposed RGVQ is better than the original VQ for several approaches, especially when searching similarity in noisy data.


Journal Details

This article was published in the following journal.

Name: IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
ISSN: 1941-0042
Pages: 949-963


DeepDyve research library

PubMed Articles [9392 Associated PubMed Articles listed on BioPortfolio]

The HMMER Web Server for Protein Sequence Similarity Search.

Protein sequence similarity search is one of the most commonly used bioinformatics methods for identifying evolutionarily related proteins. In general, sequences that are evolutionarily related share ...

A Massively Parallel Sequence Similarity Search for Metagenomic Sequencing Data.

Sequence similarity searches have been widely used in the analyses of metagenomic sequencing data. Finding homologous sequences in a reference database enables the estimation of taxonomic and function...

Performance Re-evaluation on "Codewords Distribution-Based Optimal Combination of Equal-Average Equal-Variance Equal-Norm Nearest Neighbor Fast Search Algorithm for Vector Quantization Encoding".

In the re-evaluated paper, Xie et al. proposed a new fast search algorithm for vector quantization encoding, which optimized the priority checking order of variance and norm inequality in order to spe...

HybridSim-VS: a web server for large-scale ligand-based virtual screening using hybrid similarity recognition techniques.

Molecular-similarity searches based on two-dimensional (2D) fingerprint and three-dimensional (3D) shape represent two widely used ligand-based virtual screening (VS) methods in computer-aided drug de...

Optimal Quantization Scheme for Data-Efficient Target Tracking via UWSNs Using Quantized Measurements.

Target tracking is one of the broad applications of underwater wireless sensor networks (UWSNs). However, as a result of the temporal and spatial variability of acoustic channels, underwater acoustic ...

Clinical Trials [2256 Associated Clinical Trials listed on BioPortfolio]

Phase 1 Study of Trastuzumab Administered as a Single Intravenous Infusion

The primary objective of this study was to demonstrate pharmacokinetic similarity of Mylan trastuzumab (Hercules) versus EU-approved Herceptin® and US-licensed Herceptin® and pharmacokin...

Assessment of the HIV CNS Reservoir, Neurological and Neuro-cognitive Effects, and Source of Rebound HIV in CNS

This study aims to describe in depth the CNS, CNS HIV reservoir and CNS viral rebound in consenting SEARCH 019 subjects prior to, during and after the SEARCH 019 study intervention (VHM + ...

ROBUST I Pilot Study

Robust I study is a feasibility study about safety and efficacy of DCB.

Google Health Search Trial

There are large differences in knowledge between patients and healthcare providers (i.e. physicians, physician assistants and nurse practitioners), and there is a strong interest on the pa...

The Rapid Study: Randomized Phase II Study To Expedite Allogeneic Transplant With Immediate Haploidentical Plus Unrelated Cord Donor Search Versus Matched Unrelated Donor Search For AML And High-Risk MDS Patients

The study seeks to compare time from formal search to hematopoietic cell transplantation (HCT) for patients 18 years and older, randomized between haplo-cord search and matched unrelated d...

Medical and Biotech [MESH] Definitions

Measurable biological (physiological, biochemical, and anatomical features), behavioral (psychometric pattern) or cognitive markers that are found more often in individuals with a disease than in the general population. Because many endophenotypes are present before the disease onset and in individuals with heritable risk for disease such as unaffected family members, they can be used to help diagnose and search for causative genes.

A love or pursuit of wisdom. A search for the underlying causes and principles of reality. (Webster, 3d ed)

The systematic search and discovery of natural substances which may have potential commercial applications.

Large, robust forms of brown algae (PHAEOPHYCEAE) in the order Laminariales. They are a major component of the lower intertidal and sublittoral zones on rocky coasts in temperate and polar waters. Kelp, a kind of SEAWEED, usually refers to species in the genera LAMINARIA or MACROCYSTIS, but the term may also be used for species in FUCUS or Nereocystis.

Software used to locate data or information stored in machine-readable form locally or at a distance such as an INTERNET site.

Quick Search


DeepDyve research library

Relevant Topic

Generics Drugs
A generic drug (generic drugs, short: generics) is a drug defined as "a drug product that is comparable to brand/reference listed drug product in dosage form, strength, route of administration, quality and performance characteristics, and intended u...

Searches Linking to this Article