会议专题

MULTIPLE FEATURES FUSION METHOD FOR IDENTIFYING TEXT TOPIC BOUNDARIES

In general, a document should be regarded as form of some coherent units which are called discourse segments. Discovering the segment boundaries is an important task for many natural language processing applications. In this paper, we proposed a new Chinese text topic boundaries identification method based on multiple features fusion. Our approach firstly extracts multiple features of topics shift from text. For each feature, we adopt corresponding F-dotplotting model to respectively calculate the boundary values of neighboring sentences. Subsequently, the useful features among above cues are automatically select and combined to determine topic boundaries automatically by a statistical method based on logistic regression analysis. The experimental result shows that the F-dotplotting method is more effective than common dotplotting method and the multiple features fusion method based on the logistic regression model can effectively improve Chinese text topic segmentation performance.

Topic boundaries identification multiple features fusion logistic regression model F-dotplotting method

YONG-DONG XU GUANG-RI QUAN YA-DONG WANG ZHI-MING XU

School of Computer Science and Technology, Harbin Institute of Technology (Wei Hai), WeiHai 264209, School of Computer Science and Technology, Harbin Institute of Technology (Wei Hai), WeiHai 264209, School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China

国际会议

2008 International Conference on Machine Learning and Cybernetics(2008机器学习与控制论国际会议)

昆明

英文

2950-2956

2008-07-12(万方平台首次上网日期,不代表论文的发表时间)