Visual KEA:A visual model based on keywords extraction algorithm for hub pages

摘要：

　　Automatically extracting keywords from webpage is greatly important for focused spider.There are already quite many researches on automatically extracting keywords from contentintensive web pages.However,it is still a challenge to extract keywords automatically from hyperlink-intensive web pages (hub pages).The web page author will often use all kinds of visual strengthening means to prominently demonstrate some glossaries connected with the subject.Therefore,this paper proposes a visual model of web pages,DOM-PIXEL,which regards DOM leaf node of webpage as an image element expressed by a vision vector,in which each component corresponds to one visual emphasis means; the pixel value is from the visual energy.The pixel value reflects the relevance of the corresponding DOM node with respect to subject.These parts strengthened by page author will be highlighted with particular color in DOM-PIXEL image.Then,the only request for keywords extraction algorithm is to find these particular points with particular color automatically.Just because of the intrinsic anti-noise ability of DOM-PIXEL and its visual energy transfer rule,the visual model based keywords extraction algorithm (VisualKEA)proposed in this paper significantly promotes the performance on hub pages.

关键词： DOM-PIXEL visual vector visual energy visual energy transferring rule automiatic keyword extraction

作者: Hao Peng Zhen Chen

作者单位: Department of Computer Science and Technology, Hunan International Economics University,Chang sha 410205, China

会议类型: 国际会议

会议名称: 2012 2nd international Conference on Materials Science and Information Technology(2012第二届材料科学与信息技术国际会议)(MSIT2012)

会议地点: 西安

会议语种:英文

页码: 1593-1599

在线出版日期: 2012-08-24（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Visual KEA:A visual model based on keywords extraction algorithm for hub pages