Improving Documents Classification with Semantic Features

摘要：

Successful text classification is highly dependent on the representations used. Currently, most approaches to text classification adopt the bag-of-words document representation approach, where the frequency of occurrence of each word is considered as the most important feature, but this method ignores important semantic relationships between key terms. In this paper, we proposed a system that uses ontologies and Natural Language Processing techniques to index texts. Traditional BOW matrix is replaced by Bag of Concepts(BOC). For this purpose, we developed fully automated methods for mapping kewords to their corresponding ontology concepts. Support Vector Machine a successful machine learning technique is used for classification. Experimental results shows that our proposed method dose improve text classification performance significantly.

关键词： tezt classification ontology RDF SVM

作者: Bai Rujiang Liao Junhua

作者单位: Shandong University of Technology Library Zibo, China

会议类型: 国际会议

会议名称: Second International Symposium on Electronic Commerce and Security(第二届电子商务与安全国际研究大会)(ISECS 2009)

会议地点: 南昌

会议语种:英文

页码: 640-643

在线出版日期: 2009-05-22（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Improving Documents Classification with Semantic Features