Towards human-like production and binaural localization of speech sounds in humanoid robots
We present a prototype of a humanoid robot head equipped with human-like speech sound localization and production systems designed for a new generation of robots that should autonomously evolve language and other cognitive skills. Similarly to the human auditory apparatus, the robot head contains a binaural sensor system based upon a frequency domain binaural model. This enables the robot to detect and locate the speaker autonomously on the basis of the produced speech signals. However, the temporal regularity of incoming sounds is in humans analyzed on different time scales, with the millisecond range giving rise to the sensation of pitch and the periods on the order of seconds giving rise to the sensation of rhythm. In addition, unlike for humans, detecting and localizing multiple sound signals is a rather nontrivial problem for machine audition. We therefore discuss a possible implementation of human-like spatiotemporal processing of sounds in single and multisource scenarios. Our future goals are to adequately combine the constructed speech synthesis and physical audio systems, and to establish an algorithm for detailed spatiotemporal localization of both single and concurrent speech sound sources, with roughly human-like temporal and spatial processing capabilities.
binaural speech localization production humanoid
Robert Wolff Mario Lasseck Manfred Hild Oscar Vilarroya Tarik Hadzibeganovic
Labor für Neurorobotik Institut für Informatik Humboldt-Universit.t zu Berlin Berlin,Germany Unitat de Recerca en Neurociència Cognitiva Departament de Psiquiatria i Medicina Legal Universitat
国际会议
北京
英文
1-4
2009-06-11(万方平台首次上网日期,不代表论文的发表时间)