An Improved TextRank Keywords Extraction Algorithm

摘要：

　　Keywords extraction is widely used in the field of natural language processing.How to quickly and accurately extract keywords has become the key issue in text processing.At present,there are many methods for keywords extraction,but the accuracy and versatility of the method still have much room for improvement.Thus,an improved TextRank keywords extraction algorithm is proposed in this paper.The algorithm uses the TF-IDF algorithm and the average information entropy algorithm to calculate the importance of words,and then calculates the comprehensive weight of words based on the calculation results in the text.The ini-tial weight of the TextRank algorithm node and the node probability transfer matrix are improved by using the com-prehensive weight of words,and the weights of all nodes are iteratively calculated until convergence.The weights of the nodes are sorted to obtain the weight information of the words,then the top N words are selected as the keywords.Finally,the keywords extraction function is realized by outputting the keywords.The experimental results show that compared with the traditional TF-IDF method and TextRank method,the improved TextRank keyword extraction method proposed in this paper is more general and its accuracy of extracting keywords is higher.

关键词： extraction TF-IDF algorithm TextRank algorithm Average information entropy Natural language processing

作者: Suhan Pan Zhiqiang Li Juan Dai

作者单位: College of Information Engineering,Yangzhou University Yangzhou,Jiangsu,China

会议类型: 国际会议

会议名称: 2019国图灵大会(ACM Turing Celebration conference-China 2019 )

会议地点: 成都

会议语种:英文

页码: 777-783

在线出版日期: 2019-05-17（万方平台首次上网日期，不代表论文的发表时间）

会议专题

An Improved TextRank Keywords Extraction Algorithm