会议专题

Tezt Classification Using Semi-Supervised Clustering

In this paper, mixture models are used to classify documents. The basic assumption for the documents in a collection is that each class is composed of a number of mixture components. By indcntifting the components in the document collection, the classes of documents can thereby be identified from each other. A semi-supervised clustering method is proposed to identify the components (clusters), and further, unlabeled data is used to produce more accurate clusters in document collection to correspond the components of document classes. Experimental results show that the proposed method produces better performances than support ector machine (SVM) with linear kernel, and produces comparable performance with Bayesian classifier with Expectation Maximization (EM) in text classification.

tezt classification semi-supervised clustering unlabeled data Ezpectation Mazimization

Wen Zhang Taketoshi Yoshida Xijin Tang

School of Knowledge Science, Japan Advanced Institute of Science and Technology, 1-1, Ashahidai, Tat Lab for Internet Software Technologies, Institute of Software, Chinese Academy of Sciences, Beijing Institute of Systems Science, Academy of Mathematics and Systems Science, Chinese Academy of Science

国际会议

The Second International Conference on Business Intelligence and Financial Engineering(BIFE 2009)(第二届商务智能与金融工程国际会议)

北京

英文

197-200

2009-07-24(万方平台首次上网日期,不代表论文的发表时间)