Feature Aggregation Tree:Capture Temporal Motion Information for Action Recognition in Videos
We propose a model named Feature Aggregation Tree to capture the temporal motion information in videos for action recognition.Feature Aggregation Tree constructs a logical motion sequence by considering the concrete semantics of features and mining feature combinations in a video.It will save different feature combinations and then use the bayesian model to calculate the conditional probabilities of frame-level features based on the previous features to aggregate features.It doesnt matter about the length of the video.Compared with the existing feature aggregation methods that try to enhance the descriptive capacity of features,our model has the following advantages:(i)It considers the temporal motion information in a video,and predicts the conditional probability by using the bayesian model.(ii)It can deal with arbitrary length of the video,rather than uniform sampling or feature encoding.(iii)It is compact and efficient compared to other encoding methods,with significant results compared to baseline methods.Experiments on the UCF101 dataset and HMDB51 dataset demonstrate the effectiveness of our method.
Action recognition Feature learning Feature aggregation
Bing hu
Beijing Laboratory of Intelligent Information Technology,School of Computer Science,Beijing Institute of Technology(BIT),Beijing 100081,Peoples Republic of China
国际会议
广州
英文
316-327
2018-11-23(万方平台首次上网日期,不代表论文的发表时间)