Research on Text Classification Algorithm of Largest Dispersion Based on Term Frequency

摘要：

In order to achieve a document in accordance with the contents of the page automatic classification, put forward the largest dispersion of text classification algorithm based on the term frequency. The algorithm using backward term frequency algorithm for the ntypes typical texts confirm the scientific and effective characteristics set of n-types ; rely on it, getting the classification values of webpage documents in the n-types characteristics set through adopt to the largest dispersion algorithm, getting the largest dispersion after dispersion comparison; and then compared the largest dispersion value with relative threshold, if the value is larger than the threshold, it is the type of webpage documents, but if the value is smaller than the threshold, the judgement about the type of document is invalid. The algorithm has good robustness and easy-to-use, which is very effective for the large-scale data of small documents.

关键词： text classification algorithm the largest dispersion algorithm retrospect term frequency algorithm the characteristics set

作者: An Junxiu Jin Yuchang

作者单位: School of Software Engineering, Chengdu University of Information Technology(CUIT), Chengdu, 610025, School of Culture and Social Development, Southwest University, Chongqing, China The College of Info

会议类型: 国际会议

会议名称: 2009 International Forum on Computer Science-Technology and Applications(2009年国际计算机科学技术与应用论坛 IFCSTA 2009)

会议地点: 重庆

会议语种:英文

页码: 400-403

在线出版日期: 2009-12-25（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Research on Text Classification Algorithm of Largest Dispersion Based on Term Frequency