Tezt Classification Using Semi-Supervised Clustering

摘要：

In this paper, mixture models are used to classify documents. The basic assumption for the documents in a collection is that each class is composed of a number of mixture components. By indcntifting the components in the document collection, the classes of documents can thereby be identified from each other. A semi-supervised clustering method is proposed to identify the components (clusters), and further, unlabeled data is used to produce more accurate clusters in document collection to correspond the components of document classes. Experimental results show that the proposed method produces better performances than support ector machine (SVM) with linear kernel, and produces comparable performance with Bayesian classifier with Expectation Maximization (EM) in text classification.

关键词： tezt classification semi-supervised clustering unlabeled data Ezpectation Mazimization

作者: Wen Zhang Taketoshi Yoshida Xijin Tang

作者单位: School of Knowledge Science, Japan Advanced Institute of Science and Technology, 1-1, Ashahidai, Tat Lab for Internet Software Technologies, Institute of Software, Chinese Academy of Sciences, Beijing Institute of Systems Science, Academy of Mathematics and Systems Science, Chinese Academy of Science

会议类型: 国际会议

会议名称: The Second International Conference on Business Intelligence and Financial Engineering(BIFE 2009)(第二届商务智能与金融工程国际会议)

会议地点: 北京

会议语种:英文

页码: 197-200

在线出版日期: 2009-07-24（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Tezt Classification Using Semi-Supervised Clustering