会议专题

A DIVIDE-CONQUER STRATEGY FOR ENGLISH TEXT CHUNKING

The traditional English text chunking approach identifies phrases by using only one model and phrases with the same types of features. It has been shown that the limitations of using only one model are that: the use of the same types of features is not suitable for all phrases, and data sparseness may also result. In this paper, the divide-conquer approach is proposed and applied in the identification of English phrases.This strategy divides the task of chunking into several sub-tasks according to sensitive features of each phrase and identifies different phrases in parallel. Then, a two-stage decreasing conflict strategy is used to synthesize each sub-tasks answer. By applying and testing the approach on the public training and test corpus, the F score for arbitrary phrases identification using divide-conquer strategy achieves 94.14% compared to the previous best F score of 94.17%.

Text chunking sensitive features divide-conquer strategy

YING-HONG LIANG NI-HONG WANG JIAN-MIN SU HONG-E REN

School of Information and Computer Engineering in North East Forestry University, Harbin 150040;MOE- School of Information and Computer Engineering in North East Forestry University, Harbin 150040

国际会议

2006 International Conference on Machine Learning and Cybernetics(IEEE第五届机器学习与控制论坛)

大连

英文

3370-3375

2006-08-13(万方平台首次上网日期,不代表论文的发表时间)