Structure Discovery in Multi-modal Data: a Region-based Approach

摘要：

The ability of a perception system to discern what is important in a scene and what is not is an invaluable asset, with multiple applications in object recognition, people detection and SLAM, among others. In this paper, we aim to analyze all sensory data available to separate a scene into a few physically meaningful parts, which we term structure, while discarding background clutter. In particular, we consider the combination of image and range data, and base our decision in both appearance and 3D shape. Our main contribution is the development of a framework to perform scene segmentation that preserves physical objects using multi-modal data. We combine image and range data using a novel mid-level fusion technique based on the concept of regions that avoids any pixel-level correspondences between data sources. We associate groups of pixels with 3D points into multi-modal regions that we term regionlets, and measure the structure-ness of each regionlet using simple, bottom-up cues from image and range features. We show that the highest-ranked regionlets correspond to the most prominent objects in the scene. We verify the validity of our approach on 105 scenes of household environments.

作者: Alvaro Collet Siddhartha S. Srinivasa Martial Hebert

作者单位: The Robotics Institute Carnegie Mellon University Pittsburgh,PA,USA Intel Labs Pittsburgh 4720 Forbes Ave. Suite 410 Pittsburgh,PA,USA

会议类型: 国际会议

会议名称: 2011 IEEE International Conference on Robotics and Automation(2011年IEEE世界机器人与自动化大会 ICRA 2011)

会议地点: 上海

会议语种:英文

页码: 5695-5702

在线出版日期: 2011-05-09（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Structure Discovery in Multi-modal Data: a Region-based Approach