High Dimensional Image Categorization

摘要：

We are interested in varying the vocabulary size in the image categorization task with a bag-of-visualwords to investigate its influence on the classification accuracy in two cases: in the first one, both the test-set and the training set contains the same objects (with only different view points in the test-set) and the second one where objects in the test-set do not appear at all in the training set (only other objects from the same category appear). In order to perform these tasks, we need to scale-up the algorithms used to deal with millions data points in hundred of thousand dimensions. We present k-means (used in the quantization step) and SVM (used in the classification step) algorithms extended to deal with very large datasets. These new incremental and parallel algorithms can be used on various distributed architectures, like multi-thread computer, cluster or GPU (graphics processing units). The efficiency of the approach is shown with the categorization of the 3D-Dataset from Savarese and Fei-Fei containing about 6700 images of 3D objects from 10 different classes. The obtained incremental and parallel SVM algorithm is several orders of magnitude faster than usual ones (like lib-SVM, SVMperf or CB-SVM) and the incremental and parallel kmeans is at least one order of magnitude faster than usual implementations.

关键词： High dimensional classification parallel algorithms image categorization GPU-based parallel algorithms

作者: Francois Poulet Nguyen-Khang Pham

作者单位: University of Rennes I - IRISA Campus de Beaulieu 35042,Rennes Cedex France Cantho University 1 Ly Tu Trong street Cantho Vietnam

会议类型: 国际会议

会议名称: 6th International Conference on Advanced Data Mining and Applications(第六届先进数据挖掘及应用国际会议 ADMA 2010)

会议地点: 重庆

会议语种:英文

页码: 465-476

在线出版日期: 2010-11-19（万方平台首次上网日期，不代表论文的发表时间）

会议专题

High Dimensional Image Categorization