SEARCH
ISTC-CC NEWSLETTER
RESEARCH HIGHLIGHTS
Ling Liu's SC13 paper "Large Graph Processing Without the Overhead" featured by HPCwire.
ISTC-CC provides a listing of useful benchmarks for cloud computing.
Another list highlighting Open Source Software Releases.
Second GraphLab workshop should be even bigger than the first! GraphLab is a new programming framework for graph-style data analytics.
ISTC-CC Abstract
High-Confidence Near-Duplicate Image Detection
ACM International Conference on Multimedia Retrieval (ICMR'12), Hong Kong, Jun. 5-8, 2012.
Wei Dong*, Zhe Wang, Moses Charikar, Kai Li
Princeton University
*Independent Researcher
In this paper, we propose two techniques for near-duplicate image detection at high confidence and large scale. First, we show that entropy-based filtering eliminates ambiguous SIFT features that cause most of the false positives, and enables claiming nearduplicity with a single match of the retained high-quality features. Second, we show that graph cut can be used for query expansion with a duplicity graph computed offline to substantially improve search quality. Evaluation with web images show that when combined with sketch embedding [6], our methods achieve false positive rate orders of magnitude lower than the standard visual word approach. We demonstrate the proposed techniques with a largescale image search engine which, using indexing data structure offline computed with a Hadoop cluster, is capable of serving more than 50 million web images with a single commodity server.
FULL PAPER: pdf