The Keyword Extraction of Chinese Medical Web Page Based on WFTF-IDF Algorithm
Web page keyword extraction is widely used in web text classification,text clustering,and information retrieval.However,the keyword extraction of the Chinese web page still need be improved and applied,especially in the medical field.This paper proposes an improved TF-IDF algorithm based on WF-TF-IDF to extract keywords from Chinese medical web page.The WF-TF-IDF algorithm considers three factors which are word frequency in the title,description and word distribution of categories in the corpus.We do the datapreprocessing which includes web page denoising,regular expression processing,Chinese word segmentation,synonyms exchanging and stop word filtering.Then we extract keywords based on the result of data-preprocessing.We filter the meaningless words in the extracted keywords according to the part of speech.The experimental results shows that the WFTF-IDF algorithm improves the precision rate and recall rate by about 7%compared to the traditional TF-IDF algorithm.
keyword extraction TF-IDF Chinese medical web page WF-TF-IDF
Peng Sun LiHua Wang Qianchen Xia
Beihang University Software College Beijing,China Beihang University Computer Science Beijing.China
国际会议
南京
英文
193-198
2017-10-12(万方平台首次上网日期,不代表论文的发表时间)