Removing fillers to induce semantic classes for a Chinese dialogue system

摘要：

In this paper, we introduced an nnsupervised method to remove fillers in spoken dialogues semi-automatically based on their probability distribution. Disfluencies such as fillers, repairs often make the sentence ill-formed, longer and hard to process. Fillers were emphasized instead of repairs in this paper. We conduct the unigram and bigram distribution of fillers on our Chinese voice search data and find that only using these distributions, fillers are in the first 1％ of all words. We give a new perspective of fillers distribution and new measure to detect fillers on the natural dialogue corpus.

关键词： fillers detection fillers distribution spoken dialogues

作者: Yali Li Yonghong Yan

作者单位: ThinkIT laboratory Institute of acoustics Chinese academy of science Beijing China

会议类型: 国际会议

会议名称: The 2nd IEEE International Conference on Advanced Computer Control(第二届先进计算机控制国际会议 ICACC 2010)

会议地点: 沈阳

会议语种:英文

页码: 163-166

在线出版日期: 2010-03-27（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Removing fillers to induce semantic classes for a Chinese dialogue system