A DIVIDE-CONQUER STRATEGY FOR ENGLISH TEXT CHUNKING

摘要：

The traditional English text chunking approach identifies phrases by using only one model and phrases with the same types of features. It has been shown that the limitations of using only one model are that: the use of the same types of features is not suitable for all phrases, and data sparseness may also result. In this paper, the divide-conquer approach is proposed and applied in the identification of English phrases.This strategy divides the task of chunking into several sub-tasks according to sensitive features of each phrase and identifies different phrases in parallel. Then, a two-stage decreasing conflict strategy is used to synthesize each sub-tasks answer. By applying and testing the approach on the public training and test corpus, the F score for arbitrary phrases identification using divide-conquer strategy achieves 94.14％ compared to the previous best F score of 94.17％.

关键词： Text chunking sensitive features divide-conquer strategy

作者: YING-HONG LIANG NI-HONG WANG JIAN-MIN SU HONG-E REN

作者单位: School of Information and Computer Engineering in North East Forestry University, Harbin 150040;MOE- School of Information and Computer Engineering in North East Forestry University, Harbin 150040

会议类型: 国际会议

会议名称: 2006 International Conference on Machine Learning and Cybernetics(IEEE第五届机器学习与控制论坛)

会议地点: 大连

会议语种:英文

页码: 3370-3375

在线出版日期: 2006-08-13（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A DIVIDE-CONQUER STRATEGY FOR ENGLISH TEXT CHUNKING