A Modified Approach to Keyword Eztraction Based on Word-similarity
two keyword-extraction ways are usually used, one is simply using the information from exactly single word like word frequency and TF.JDF, the other is based on the relationship between words. The relationship is usually described as word similarity which derives from a corpus (WordNet, HowNet) or man-made thesaurus. With the information explosion nowdays, the words we using are growing and changing rapidly. A lot of new words are not specified in man-made corpus. This paper proposes a new method to build a word similarity thesaurus. Using the semantic information from the thesaurus, together with TF.IDF and words first occurrence, a keyword extraction algorithm is demonstrated, the results and analysis are also given.
keyword eztraction word similarity Jenson-Shannon divergence Naive Bayes
Meng Wenchao Liu Lianchen Dai Ting
National CIMS Engineering Research Center,Tsinghua University,Beijing,China
国际会议
上海
英文
2203-2207
2009-11-20(万方平台首次上网日期,不代表论文的发表时间)