An Improved TextRank Keywords Extraction Algorithm
Keywords extraction is widely used in the field of natural language processing.How to quickly and accurately extract keywords has become the key issue in text processing.At present,there are many methods for keywords extraction,but the accuracy and versatility of the method still have much room for improvement.Thus,an improved TextRank keywords extraction algorithm is proposed in this paper.The algorithm uses the TF-IDF algorithm and the average information entropy algorithm to calculate the importance of words,and then calculates the comprehensive weight of words based on the calculation results in the text.The ini-tial weight of the TextRank algorithm node and the node probability transfer matrix are improved by using the com-prehensive weight of words,and the weights of all nodes are iteratively calculated until convergence.The weights of the nodes are sorted to obtain the weight information of the words,then the top N words are selected as the keywords.Finally,the keywords extraction function is realized by outputting the keywords.The experimental results show that compared with the traditional TF-IDF method and TextRank method,the improved TextRank keyword extraction method proposed in this paper is more general and its accuracy of extracting keywords is higher.
extraction TF-IDF algorithm TextRank algorithm Average information entropy Natural language processing
Suhan Pan Zhiqiang Li Juan Dai
College of Information Engineering,Yangzhou University Yangzhou,Jiangsu,China
国际会议
2019国图灵大会(ACM Turing Celebration conference-China 2019 )
成都
英文
777-783
2019-05-17(万方平台首次上网日期,不代表论文的发表时间)