10/27/2004 This paper concerns approximate nearest neighbor searching algorithms, which have become increasingly important, especially in high dimensional perception areas such as computer vision, with dozens of publications in recent years

Ting Liu 27 Oct

This paper concerns approximate nearest neighbor searching algorithms, which have become increasingly important, especially in high dimensional perception areas such as computer vision, with dozens of publications in recent years. Much of this enthusiasm is due to a successful new approximate nearest neighbor approach called Locality Sensitive Hashing (LSH). In this paper we ask the question: can earlier spatial data structure approaches to {\em exact} nearest neighbor, such as metric trees, be altered to provide approximate answers to proximity queries and if so, how? We introduce a new kind of metric tree that allows overlap: certain datapoints may appear in both the children of a parent. We also introduce new approximate k-NN search algorithms on this structure. We show why these structures should be able to exploit the same random-projection-based approximations that LSH enjoys, but with a simpler algorithm and perhaps with greater efficiency. We then provide a detailed empirical evaluation on five large, high dimensional datasets which show accelerations one to three orders of magnitude over LSH. This result holds true throughout the spectrum of approximation levels.
The following papers are all related to this talk:

T. Liu, A.W.Moore, K. Yang, and A. Gray. An Investigation of Practical Approximate Nearest Neighbor Algorithms. Accepted to NIPS 2004. (I put the technical report version on-line)
A. Gionis, P.Indyk, and R.Motwani. Similarity Search in High Dimensions via Hashing. In Proc 25th VLDB conference, 1999
P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In STOC, pages 604-613, 1998
P. Indyk and N. Thaper. Fast image retrieval via embeddings. In the 3rd International Workshop on Statistical and Computational Theories of Vision (SCTV 2003)