A Large-scale Online Search System of High-Dimensional Vectors Based on Key-Value Store
In this paper,we propose an online search system based on Key-Value Store which aims to provide real-time k-NN (k-Nearest Neighbor) search in large-scale high-dimensional vector spaces.Through an improved indexing method based on KD-tree,the vector space is divided into a number of fixed-size heaps,only vectors of a specified heap need to do k-NN calculation,the time complexity is O(1).So,the key step becomes to search a heap in the Key-Value Store by heap ID,we approximate the global computing problem of Euclidean distance k-NN search into a look-up table problem.Therefore,on the one hand,the search cost of a single vector mainly depends on the progress to find the value of a given key in Key-Value Store,whose time complexity is O (log2 N).On the other hand,because of the linear horizontal scaling properties of the Key- Value Store,we can increase the number of hosts to linearly increase the throughput of the entire cluster.So,maintaining a fixed proportion of the cluster size and the target vector space can obtain relatively fixed search performance,and complete the search process in a relatively fixed period of time.As a result, even if the vector space is very large,we can also achieve an effective online search by enough number of hosts.Firstly,this paper introduces the main processes of the system,then the indexing algorithm and vector data storage format is described.Finally,we take the Euclidean distance of the 128-dimensional SIFT feature vectors k-NN search as load to give the performance assessment.
high-dimensional k-NN Key-Value Store largescale online search system
Yuezhuo Zhang Li Zha Jia Liu Zhiwei Xu
Institute of Computing Technology Chinese Academy of Sciences, Beijing, China;Graduate University of Institute of Computing Technology Chinese Academy of Sciences, Beijing, China
国际会议
第8届语义知识与网络国际会议(2012 Eighth International Conference on Semanties,Knowledge and Grids )(SKG2012)
北京
英文
233-236
2012-10-22(万方平台首次上网日期,不代表论文的发表时间)