STUDY ON THE CONSTRUCTION OF DOMAIN TEXT CLASSIFICATION MODEL WITH THE HELP OF DOMAIN KNOWLEDGE

摘要：

Traditional text classification model uses statistical methods to obtain features. But in the aspect of discrimination domain and non-domain text category, domain knowledge relations havent been taken account of in these methods. A domain text classification model was presented in this paper. This model used the support vector machine learning algorithm, gained domain classification feature words through statistic and union domain words, structured domain classification feature space. With the help of domain knowledge relations, computed relevance between domain concepts, got domain classification feature weight. Finally domain text classification was realized. An experiment in the Yunnan tourism domain was carried on to confirm that domain knowledge relations have a good influence on the domain text classification. The classification accuracy rate has been increased 0.04 than improved TFIDF method.

关键词： Tezt classification Feature selection Domain knowledge Relations

作者: ZHENG-TAO YU LU HAN CUN-LI MAO JIAN-YI GUO XIANG-YAN MENG ZHI-KUN ZHANG

作者单位: The School of Information Engineering and Automation, Kunming University of Science and Technology, The School of Information Engineering and Automation, Kunming University of Science and Technology, The Institute of Intelligent Information Processing, Computer Technology Application Key Laboratory

会议类型: 国际会议

会议名称: 2008 International Conference on Machine Learning and Cybernetics(2008机器学习与控制论国际会议)

会议地点: 昆明

会议语种:英文

页码: 2612-2617

在线出版日期: 2008-07-12（万方平台首次上网日期，不代表论文的发表时间）

会议专题

STUDY ON THE CONSTRUCTION OF DOMAIN TEXT CLASSIFICATION MODEL WITH THE HELP OF DOMAIN KNOWLEDGE