Tezt Classification Using Semi-Supervised Clustering
In this paper, mixture models are used to classify documents. The basic assumption for the documents in a collection is that each class is composed of a number of mixture components. By indcntifting the components in the document collection, the classes of documents can thereby be identified from each other. A semi-supervised clustering method is proposed to identify the components (clusters), and further, unlabeled data is used to produce more accurate clusters in document collection to correspond the components of document classes. Experimental results show that the proposed method produces better performances than support ector machine (SVM) with linear kernel, and produces comparable performance with Bayesian classifier with Expectation Maximization (EM) in text classification.
tezt classification semi-supervised clustering unlabeled data Ezpectation Mazimization
Wen Zhang Taketoshi Yoshida Xijin Tang
School of Knowledge Science, Japan Advanced Institute of Science and Technology, 1-1, Ashahidai, Tat Lab for Internet Software Technologies, Institute of Software, Chinese Academy of Sciences, Beijing Institute of Systems Science, Academy of Mathematics and Systems Science, Chinese Academy of Science
国际会议
北京
英文
197-200
2009-07-24(万方平台首次上网日期,不代表论文的发表时间)