Application of Web Page Classification in a Domain-specific Search Engine
Automatic web page classification can be used in domain-specific search engines to help users get the specific information more conveniently and precisely on Intcrnet.The semantic similarity and noisy data in domain-specific web pages make traditional classifier perform poorly on them.In this paper,a dictionary-based muitilingual web page classification method is proposed to try to improve the classification performance.A domain-specific dictionary is constructed in the method to intensify the domain-specific knowledge in the pages.An automatic encoding detection and integration method is also introduced in the classifier to extract Chinese and English information precisely from the multilinguai pages.After verified in the experiments,the method is integrated into a real domain-specific search engine where it shows good effectiveness.
Web page classification Search engine Domain-specific knowledge Dictionary
Chunyan Liang
North China Electric Power University
国际会议
沈阳
英文
568-570
2012-07-27(万方平台首次上网日期,不代表论文的发表时间)