会议专题

Improving Automatic Tezt Document Clustering via Selecting a Small Amount of Labeled Data

We have investigated an approach which improves automatic text document clustering performance with the help of a small number of labeled documents. An active learning approach is proposed to select informative documents for obtaining user feedbacks on document labels. We make use of the intermediate cluster structure, which is discovered by the clustering process, to guide the active learning. Each cluster is represented by a language model. We make use of the uncertainty of document assignments as a clue for finding informative documents. We have conducted extensive experiments on several real-world corpora. The results demonstrate that our proposed framework is effective.

knowledge management tezt mining active learning semi-supervised document clustering

Ruizhang Huang Wai Lam

Dept. of Industrial & Systems Engineering, The Hong Kong Polytechnic University Hung Horn, Kowloon, Dept. of Systems Engineering & Engineering Management, The Chinese University of Hong Kong Shatin, H

国际会议

The 9th International Symposium on Knowledge and Systems Sciences,The 4th Asia-Pacific International Conference on Knowledge Management(第九届国际知识与系统科学学术年会暨第四届亚太国际知识管理年会)

广州

英文

54-60

2008-12-11(万方平台首次上网日期,不代表论文的发表时间)