Learning Semantic Similarity for Multi-label Text Categorization
The multi-label text categorization is supervised learning,where a document is associated with multiple labels simultaneously.The current multi-label text categorization approaches suffer from limitations when the expensive labelled text data is little but the unlabelled text data is abundant,because they are unable to exploit information from unlabelled text data.To address this problem,we learn the word semantic similarity by deep learning using the unlabelled text data,and then incorporate the learned word semantic similarity into current multi-label text categorization approaches.We conduct experiments with the Slashdot and Tmc2007 datasets,and these experiments demonstrate our proposed method will greatly improve the performance of current multi-label text categorization approaches.
Li Li Mengxiang Wang Longkai Zhang Houfeng Wang
Key Laboratory of Computational Linguistics,Peking University,Ministry of Education,Beijing,China
国际会议
Chinese Lexical Semantics 15th Workshop(CLSW 2014)(第十五届汉语词汇语义学国际研讨会)
澳门
英文
260-269
2014-06-09(万方平台首次上网日期,不代表论文的发表时间)