A Good All-around Semi-supervised Learning Algorithm for Information Categorization
The paper reports a study on information categorizing based on high efficient feature selection and comprehensive semi-supervised learning algorithm. Feature selections or conversions are performed using maximum mutual information including linear and nonlinear feature conversions. Entropy is made use of and extended to find right features commendably with machine learning method. Fuzzy Partition Clustering Method is presented and used to obtain a few labeled samples and some external clusters automatically by measuring the similarity of clustering correlation documents. So categorization bases are found for supervised learning. Furthermore, Naive Bayes augment learning is combined to design and learn categorizers. And the approach of estimating the loss of classifying error facilitates to balance the selection of candidates. The all-around learning algorithm can greatly improve the precision and efficiency of web information categorization.
component web information categorization dimensionality reduction fuzzy clustering
Lizhen Liu Hai Chen Chao Du
Information Engineering College CNU Beijing,China
国际会议
上海
英文
299-302
2009-11-20(万方平台首次上网日期,不代表论文的发表时间)