Threshold Determining Method for Feature Selection
Feature selection is a key step in text categorization, its results has direct influence on the classification accuracy. Evaluation function is usually adopted in feature selection method to calculate the value of feature words,and the feature words which assessed value is higher than setted threshold are maintained as the final feature subset.So the threshold is the important factors of feature selection. However, the threshold is very difficult to determine. In theory, there is no good solution. In practice, people often use their experience to set a initial value, then debug threshold repeatedly according to the results of the classification. In such case, debugging scope is often too great to be easy to determine the threshold. Aimming at the difficulties of threshold determining, this paper mainly studied threshold determining methods for feature selection.First, based on the analysis of several common feature selection methods, the key questions of threshold determining are defined ,and the idea of threshold determining is put forward.Then,in accordance with the idea, four methods are designed for threshold detemining based on the characteristics of the different feature selection methods. Experimental results show that the proposed methods are effective in improving classification performance. After analyzing the results, this paper gets expressly some useful conclusions.
tezt classification feature selection threshold threshold interval iterative process
Yanling Li Li Song
Xian Research Institute of Hi-Technology School of Automativ Control, Northwestern Polytechnical Un Xian Research Institute of Hi-Technology Xian, China
国际会议
Second International Symposium on Electronic Commerce and Security(第二届电子商务与安全国际研究大会)(ISECS 2009)
南昌
英文
929-933
2009-05-22(万方平台首次上网日期,不代表论文的发表时间)