会议专题

Pixel Saliency Based Encoding for Fine-Grained Image Classification

  Fine-grained image classification concerns categorization at subordinate levels,where the distinction between inter-class objects is very subtle and highly local.Recently,Convolutional Neural Networks(CNNs)have almost yielded the best results on the basic image classification tasks.In CNN,the direct pooling operation is always used to resize the last convolutional feature maps from n × n × c to 1 × 1 × c for feature representation.However,such pooling operation may lead to extreme saliency compression of feature map,especially in fine-grained image classification.In this paper,to more deeply explore the representation ability of the feature map,we propose a Pixel Saliency based Encoding method,which is called PS-CNN.First,in our PS-CNN,the saliency matrix is obtained by evaluating the saliency of each pixel in the feature map.Then,we segment the original feature maps into multiple ones with multiple generated binary masks via thresholding on the obtained saliency matrix,and subsequently squeeze those masked feature maps into the encoded ones.Finally,a fine-grained feature representation is generated by concatenating the original feature maps with the encoded ones.Experimental results show that our simple yet powerful PS-CNN outperforms state-of-the-art classification approaches.Specially,we can achieve 89.1%classification accuracy on the Aircraft,92.3%on the Stanford Car,and 81.9%on the NABirds.

Pixel saliency Feature encoding Fine-grained Image classification

Chao Yin Lei Zhang Ji Liu

College of Communication Engineering,Chongqing University,No.174 Shazheng Street,Shapingba district,Chongqing 400044,China

国际会议

中国模式识别与计算机视觉大会(PRCV2018)

广州

英文

274-285

2018-11-23(万方平台首次上网日期,不代表论文的发表时间)