Feature Aggregation Tree:Capture Temporal Motion Information for Action Recognition in Videos

摘要：

　　We propose a model named Feature Aggregation Tree to capture the temporal motion information in videos for action recognition.Feature Aggregation Tree constructs a logical motion sequence by considering the concrete semantics of features and mining feature combinations in a video.It will save different feature combinations and then use the bayesian model to calculate the conditional probabilities of frame-level features based on the previous features to aggregate features.It doesnt matter about the length of the video.Compared with the existing feature aggregation methods that try to enhance the descriptive capacity of features,our model has the following advantages:(i)It considers the temporal motion information in a video,and predicts the conditional probability by using the bayesian model.(ii)It can deal with arbitrary length of the video,rather than uniform sampling or feature encoding.(iii)It is compact and efficient compared to other encoding methods,with significant results compared to baseline methods.Experiments on the UCF101 dataset and HMDB51 dataset demonstrate the effectiveness of our method.

关键词： Action recognition Feature learning Feature aggregation

作者: Bing hu

作者单位: Beijing Laboratory of Intelligent Information Technology,School of Computer Science,Beijing Institute of Technology(BIT),Beijing 100081,Peoples Republic of China

会议类型: 国际会议

会议名称: 中国模式识别与计算机视觉大会(PRCV2018)

会议地点: 广州

会议语种:英文

页码: 316-327

在线出版日期: 2018-11-23（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Feature Aggregation Tree:Capture Temporal Motion Information for Action Recognition in Videos