Learning Non-local Representation for Visual Tracking
Discriminative Correlation Filter(DCF)based trackers have tremendously improved the tracking performance.They adopt the first frame of video sequence to initialize the tracker and provide a fast solution due to its formulation in the Fourier domain.Previous work that applies a DCF layer on the top of pretrianed CNN,however,has not taken full advantage of CNN feature maps.In this paper,we propose a tracking architecture to fuse the local and global response map for visual tracking in an accuracy and robust way.The feature map extracted from pretrained CNN is applied to a fully-convolutional DCF layer and a nonlocal layer for capturing local and global response map.Experiments show that our method achieves state-of-the-art performance on three popular benchmarks: OTB-2013,OTB-2015 and VOT2016.
Visual tracking DCF Non-local Feature pyramid
Peng Zhang Zengfu Wang
Institute of Intelligent Machines,Chinese Academy of Sciences,Hefei,China;University of Science and Technology of China,Hefei,China;National Engineering Laboratory for Speech and Language Information Processing,Hefei,China
国际会议
广州
英文
209-220
2018-11-23(万方平台首次上网日期,不代表论文的发表时间)