Track topics on Twitter Track topics that are important to you
The recent years have witnessed the emerging of vector quantization (VQ) techniques for efficient similarity search. VQ partitions the feature space into a set of codewords and encodes data points as integer indices using the codewords. Then the distance between data points can be efficiently approximated by simple memory lookup operations. By the compact quantization, the storage cost, and searching complexity are significantly reduced, thereby facilitating efficient large-scale similarity search. However, the performance of several celebrated VQ approaches degrades significantly when dealing with noisy data. In addition, it can barely facilitate a wide range of applications as the distortion measurement only limits to ℓ norm. To address the shortcomings of the squared Euclidean (ℓ norm) loss function employed by the VQ approaches, in this paper, we propose a novel robust and general VQ framework, named RGVQ, to enhance both robustness and generalization of VQ approaches. Specifically, a ℓ-norm loss function is proposed to conduct the ℓ-norm similarity search, rather than the ℓ norm search, and the q-th order loss is used to enhance the robustness. Despite the fact that changing the loss function to ℓ norm makes VQ approaches more robust and generic, it brings us a challenge that a non-smooth and non-convex orthogonality constrained ℓ-norm function has to be minimized. To solve this problem, we propose a novel and efficient optimization scheme and specify it to VQ approaches and theoretically prove its convergence. Extensive experiments on benchmark data sets demonstrate that the proposed RGVQ is better than the original VQ for several approaches, especially when searching similarity in noisy data.
This article was published in the following journal.
Name: IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Protein sequence similarity search is one of the most commonly used bioinformatics methods for identifying evolutionarily related proteins. In general, sequences that are evolutionarily related share ...
Sequence similarity searches have been widely used in the analyses of metagenomic sequencing data. Finding homologous sequences in a reference database enables the estimation of taxonomic and function...
In the re-evaluated paper, Xie et al. proposed a new fast search algorithm for vector quantization encoding, which optimized the priority checking order of variance and norm inequality in order to spe...
Molecular-similarity searches based on two-dimensional (2D) fingerprint and three-dimensional (3D) shape represent two widely used ligand-based virtual screening (VS) methods in computer-aided drug de...
Target tracking is one of the broad applications of underwater wireless sensor networks (UWSNs). However, as a result of the temporal and spatial variability of acoustic channels, underwater acoustic ...
The primary objective of this study was to demonstrate pharmacokinetic similarity of Mylan trastuzumab (Hercules) versus EU-approved Herceptin® and US-licensed Herceptin® and pharmacokin...
This study aims to describe in depth the CNS, CNS HIV reservoir and CNS viral rebound in consenting SEARCH 019 subjects prior to, during and after the SEARCH 019 study intervention (VHM + ...
Robust I study is a feasibility study about safety and efficacy of DCB.
There are large differences in knowledge between patients and healthcare providers (i.e. physicians, physician assistants and nurse practitioners), and there is a strong interest on the pa...
The Rapid Study: Randomized Phase II Study To Expedite Allogeneic Transplant With Immediate Haploidentical Plus Unrelated Cord Donor Search Versus Matched Unrelated Donor Search For AML And High-Risk MDS Patients
The study seeks to compare time from formal search to hematopoietic cell transplantation (HCT) for patients 18 years and older, randomized between haplo-cord search and matched unrelated d...
Measurable biological (physiological, biochemical, and anatomical features), behavioral (psychometric pattern) or cognitive markers that are found more often in individuals with a disease than in the general population. Because many endophenotypes are present before the disease onset and in individuals with heritable risk for disease such as unaffected family members, they can be used to help diagnose and search for causative genes.
A love or pursuit of wisdom. A search for the underlying causes and principles of reality. (Webster, 3d ed)
The systematic search and discovery of natural substances which may have potential commercial applications.
Large, robust forms of brown algae (PHAEOPHYCEAE) in the order Laminariales. They are a major component of the lower intertidal and sublittoral zones on rocky coasts in temperate and polar waters. Kelp, a kind of SEAWEED, usually refers to species in the genera LAMINARIA or MACROCYSTIS, but the term may also be used for species in FUCUS or Nereocystis.
Software used to locate data or information stored in machine-readable form locally or at a distance such as an INTERNET site.
A generic drug (generic drugs, short: generics) is a drug defined as "a drug product that is comparable to brand/reference listed drug product in dosage form, strength, route of administration, quality and performance characteristics, and intended u...