Improved Automatic Keyphrase Extraction by Using Semantic Information

摘要：

Keyphrases provide semantic metadata producing an overview of the content of a document, they are used in many text-mining applications. This paper proposes a new method that improves automatic keyphrase extraction by using semantic information of candidate kevphrases. Our method is realized in two stages. In selecting candidates stage, after extraction of all phrases from document, a word sense disambiguation method is used to get senses of phrases, then term conflation is performed by using case folding, stemming, and semantic relatedness between candidates. In filtering stage, four features are used to compute for each candidate: the TF×lDF measure describing the specificity of a phrase, first occurrence of a phrase in the document, length of a phrase, and coherence score which measure the semantic relatedness between the phrase and other candidates. A Naive Baves scheme builds a prediction model training data with known keyphrases, and then uses the model to calculate the overall probability for each candidate. We evaluate semantically improved method against the well known Kea system by using a more effective semantically enhanced evaluation method. The inter domain experiment shows that quality of keyphrases extraction can be improved significantly when semantic information is exploited The intra-domain experiment shows our method is competitive with Kea++ algorithm, and not domain-specific.

关键词： keyphrase extraction word sense disambiguafion semantic information

作者: XiaoLing Wang DeJun Mu Jun Fang

作者单位: Control and Networks Laboratory, School of Automation, Northwestern Polytechnical University, China

会议类型: 国际会议

会议名称: International Conference on Intelligent Computation Technology and Automation(2008 智能计算技术与自动化国际会议 ICICTA 2008)

会议地点: 长沙

会议语种:英文

页码: 1061-1065

在线出版日期: 2008-10-20（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Improved Automatic Keyphrase Extraction by Using Semantic Information