Learning Semantic Similarity for Multi-label Text Categorization

摘要：

　　The multi-label text categorization is supervised learning,where a document is associated with multiple labels simultaneously.The current multi-label text categorization approaches suffer from limitations when the expensive labelled text data is little but the unlabelled text data is abundant,because they are unable to exploit information from unlabelled text data.To address this problem,we learn the word semantic similarity by deep learning using the unlabelled text data,and then incorporate the learned word semantic similarity into current multi-label text categorization approaches.We conduct experiments with the Slashdot and Tmc2007 datasets,and these experiments demonstrate our proposed method will greatly improve the performance of current multi-label text categorization approaches.

作者: Li Li Mengxiang Wang Longkai Zhang Houfeng Wang

作者单位: Key Laboratory of Computational Linguistics,Peking University,Ministry of Education,Beijing,China

会议类型: 国际会议

会议名称: Chinese Lexical Semantics 15th Workshop(CLSW 2014)(第十五届汉语词汇语义学国际研讨会)

会议地点: 澳门

会议语种:英文

页码: 260-269

在线出版日期: 2014-06-09（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Learning Semantic Similarity for Multi-label Text Categorization