Hierarchical Convolutional Features for Visual Tracking Chao Ma, SJTU Jia-Bin Huang, UIUC Xiaokang Yang, SJTU Ming-Hsuan Yang, UC Merced
Hierarchical Convolutional Features for Visual Tracking Chao Ma,SJTU Jia-Bin Huang,UIUC Xiaokang Yang,SJTU Ming-Hsuan Yang,UC Merced
Hierarchical Convolutional features for visual racking What is visual tracking? How to do it? What is the novel point of this paper
Hierarchical Convolutional Features for Visual Tracking • What is visual tracking? • How to do it? • What is the Novel point of this paper?
Visual Tracking A typical scenario of visual tracking is to track an unknown target object, specified by a bounding box in the first frame #050 #080 #02
Visual Tracking • A typical scenario of visual tracking is to track an unknown target object, specified by a bounding box in the first frame
Visual tracking ng Method Tracking by binary Classifiers Visual tracking can be posed as a repeated detection problem in a local window. For each frame, a set of positive and negative training samples are collected for incrementally learning a discriminative classifier to separate a target from its backgrounds Sampling ambiguity Tracking by Correlation Filters Tracking methods based on correlation filters regress all the circular-shifted versions of input features to a target gaussian function and thus no hard-thresholded samples of target appearance are needed. Tracking by CNNs Visual representations are of great importance for object tracking
Visual Tracking Method • Tracking by Binary Classifiers. • Visual tracking can be posed as a repeated detection problem in a local window. For each frame, a set of positive and negative training samples are collected for incrementally learning a discriminative classifier to separate a target from its backgrounds. • Sampling ambiguity • Tracking by Correlation Filters. • Tracking methods based on correlation filters regress all the circular-shifted versions of input features to a target Gaussian function and thus no hard-thresholded samples of target appearance are needed. • Tracking by CNNs • Visual representations are of great importance for object tracking
Chao mas Work Learn correlation filters over multi-dimensional features in a way similar to existing methods The main differences lie in the use of learned CNN features rather than hand-crafted features Former CNn trackers all rely on positive and negative training samples and only exploit the features from the last layer. In contrast, our approach builds on adaptive correlation filters which regress the dense, circularly shifted samples with soft labels and effectivel alleviate sampling ambiguity
Chao Ma’s Work • Learn correlation filters over multi-dimensional features in a way similar to existing methods. The main differences lie in the use of learned CNN features rather than hand-crafted features • Former CNN trackers all rely on positive and negative training samples and only exploit the features from the last layer. In contrast, our approach builds on adaptive correlation filters which regress the dense, circularly shifted samples with soft labels and effectively alleviate sampling ambiguity
Conv W Conv conv W Position in Estimated Cropped search Window Tracking Output Figure 3 Main steps of the proposed algorithm. Given an image, we first crop the search window centered at the estimated position in the previous frame. We use the third, fourth and fifth convolu- tional layers as our target representations. Each layer indexed by i is then convolved with the learned linear correlation filter w to generate a response map, whose location of the maximum value indicates the estimated target position. We search the multi-level response maps to infer the target location in a coarse-to-fine fash- ion
Algo orithm Use the convolutional feature maps from a cnn alex Net or VGG-Net to encode target appearance along with the cnn forward propagation, the semantical discrimination between objects from different categories is strengthened, as well as a gradual reduction of spatial resolution for precise localization Learn a discriminative classifier and estimate the translation of target objects by searching for the maximum value of correlation response map. Given the set of correlation response maps we hierarchically infer the target translation of each layer
Algorithm • Use the convolutional feature maps from a CNN, AlexNet or VGG-Net to encode target appearance. Along with the CNN forward propagation, the semantical discrimination between objects from different categories is strengthened, as well as a gradual reduction of spatial resolution for precise localization. • Learn a discriminative classifier and estimate the translation of target objects by searching for the maximum value of correlation response map. • Given the set of correlation response maps , we hierarchically infer the target translation of each layer
Implementation details Experimental Validations
•Implementation Details • Experimental Validations
Conclusion Combine cnn and correlation filters together. Use not only the last layer but also the early layers of cnn to achieve better performance Extensive experimental results show that the proposed algorithm performs favorably against the state-of-the -art methods in terms of accuracy and robustness
Conclusion • Combine CNN and Correlation Filters together. • Use not only the last layer but also the early layers of CNN to achieve better performance. • Extensive experimental results show that the proposed algorithm performs favorably against the state-of-the-art methods in terms of • accuracy and robustness
Online obiect tracking with Proposal selection Reporter: Liu Cun Student i:115413910018 201605.03
Online Object Tracking with Proposal Selection Reporter : Liu Cun Student ID: 115413910018 2016.05.03