Context Enhanced Keyword Extraction for Sparse Geo-Entity Relation from Web Texts
Geo-entity relation recognition from rich texts requires robust and effective solutions on keyword extraction.Compared with supervised learning methods,unsupervised learning methods attract more attention for their capability to capture the dynamic feature variation in text and to discover additional relation types.The frequency-based methods of keyword extraction have been widely studied.However,it is difficult to be applied into geo-entity keyword extraction directly because of the sparse distribution of geo-entity relations in texts.Besides,there are few studies on Chinese keyword extraction.This paper proposes a context enhanced keyword extraction method.Firstly the contexts for geo-entities are enhanced to reduce the sparseness of terms.Secondly two well-known frequency-based statistical methods (i.e.,DF and Entropy) are used to build a large-scale corpus automatically from the enhanced contexts.Thirdly the lexical features and their weights are statistically determined based on the corpus to enhance the distinction of the terms.Finally,all terms in the enhanced contexts are measured with the lexical features,and the most important terms are selected as the keywords of geo-entity pairs.Experiments are conducted with mass real Chinese web texts.Compared with DF and Entropy,the presented method improves the precision by 41% and 36% respectively in discovering the keywords with sparse distribution and generates additional 60% correct keywords for geo-entity relation recognition.
Geographical information retrieval Geo-entity relation Keyword extraction Text mining Context enhancement
Li Yu Feng Lu Xueying Zhang Xiliang Liu
State Key Laboratory of Resources and Environmental Information System,Institute of Geographic Scien State Key Laboratory of Resources and Environmental Information System,Institute of Geographic Scien Key Laboratory of Virtual Geography Environment,Nanjing Normal University,Nanjing 210046,China State Key Laboratory of Resources and Environmental Information System,Institute of Geographic Scien
国际会议
International Asia-Pacific Web Conference(第18届国际亚太互联网大会)
苏州
英文
253-264
2016-09-23(万方平台首次上网日期,不代表论文的发表时间)