A Refinement Framework for Cross Language Text Categorization
Cross language text categorization is the task of exploiting labelled documents in a source language (e.g.English) to classify documents in a target language (e.g.Chinese).In this paper,we focus on investigating the use of a bilingual lexicon for cross language text categorization.To this end,we propose a novel refinement framework for cross language text categorization.The framework consists of two stages.In the first stage,a cross language model transfer is proposed to generate initial labels of documents in target language.In the second stage,expectation maximization algorithm based on naive Bayes model is introduced to yield resulting labels of documents.Preliminary experimental results on collected corpora show that the proposed framework is effective.
Ke Wu Bao-Liang Lu
Department of Computer Science and Engineering.Shanghai Jiao Tong University 800 Dong Chuan Road,Shanghai 200240,China
国际会议
4th Asia Information Retrieval Symposium(AIRS 2008)(第四届亚洲信息检索研讨会)
哈尔滨
英文
401-411
2008-01-16(万方平台首次上网日期,不代表论文的发表时间)