Pixel Saliency Based Encoding for Fine-Grained Image Classification
Fine-grained image classification concerns categorization at subordinate levels,where the distinction between inter-class objects is very subtle and highly local.Recently,Convolutional Neural Networks(CNNs)have almost yielded the best results on the basic image classification tasks.In CNN,the direct pooling operation is always used to resize the last convolutional feature maps from n × n × c to 1 × 1 × c for feature representation.However,such pooling operation may lead to extreme saliency compression of feature map,especially in fine-grained image classification.In this paper,to more deeply explore the representation ability of the feature map,we propose a Pixel Saliency based Encoding method,which is called PS-CNN.First,in our PS-CNN,the saliency matrix is obtained by evaluating the saliency of each pixel in the feature map.Then,we segment the original feature maps into multiple ones with multiple generated binary masks via thresholding on the obtained saliency matrix,and subsequently squeeze those masked feature maps into the encoded ones.Finally,a fine-grained feature representation is generated by concatenating the original feature maps with the encoded ones.Experimental results show that our simple yet powerful PS-CNN outperforms state-of-the-art classification approaches.Specially,we can achieve 89.1%classification accuracy on the Aircraft,92.3%on the Stanford Car,and 81.9%on the NABirds.
Pixel saliency Feature encoding Fine-grained Image classification
Chao Yin Lei Zhang Ji Liu
College of Communication Engineering,Chongqing University,No.174 Shazheng Street,Shapingba district,Chongqing 400044,China
国际会议
广州
英文
274-285
2018-11-23(万方平台首次上网日期,不代表论文的发表时间)