A Visual Language Model for Estimating Object Pose and Structure in a Generative Visual Domain

摘要：

We present a generative domain of visual objects by analogy to the generative nature of human language. Just as small inventories of phonemes and words combine in a grammatical fashion to yield myriad valid words and utterances, a small inventory of physical parts combine in a grammatical fashion to yield myriad valid assemblies. We apply the notion of a language model from speech recognition to this visual domain to similarly improve the performance of the recognition process over what would be possible by only applying recognizers to the components. Unlike the contextfree models for human language, our visual language models are context sensitive and formulated as stochastic constraintsatisfaction problems. And unlike the situation for human language where all components are observable, our methods deal with occlusion, successfully recovering object structure despite unobservable components. We demonstrate our system with an integrated robotic system for disassembling structures that performs whole-scene reconstruction consistent with a language model in the presence of noisy feature detectors.

作者: Siddharth Narayanaswamy Andrei Barbu Jeffrey Mark Siskind

作者单位: School of Electrical and Computer Engineering,Purdue University,West Lafayette,IN,47907,USA

会议类型: 国际会议

会议名称: 2011 IEEE International Conference on Robotics and Automation(2011年IEEE世界机器人与自动化大会 ICRA 2011)

会议地点: 上海

会议语种:英文

页码: 4854-4860

在线出版日期: 2011-05-09（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A Visual Language Model for Estimating Object Pose and Structure in a Generative Visual Domain