A NEED TO REDEFINE WEB CONTENT OUTLIERS

摘要：

Web is a pivotal source of information for umpteen number of application scenarios in real world.As Web data is a collaborative result of many individuals and organization its quite unstructured thus creating the possibility of content being wrongly assigned to the wrong group due to human errors or due to devious intentions of contributors.Web Content Outlier Mining focuses on identifying pages that doesnt conform to the rest and flag them as outliers.This is more of a classification task wherein pages are extracted for a category of interest and verify their commonness.Most of the existing research uses the textual content to verify the commonness among pages thus ignoring the nontextual content that is embedded within the pages. This paper explorers the need to redefine web content outliers and proposes better ways to combat web page content spam.

关键词： Web content mining Content Outliers Page quality Web Content Outlier Anomaly detection Content Spam

作者: KRISHNA CHAITANYA PENNETE

作者单位: School of Information Science,and Technology South West Jiao Tong University

会议类型: 国际会议

会议名称: 2011 3rd International Conference on Computer Technology and Development(2011第三届计算机技术与发展国际会议 ICCTD2011)

会议地点: 成都

会议语种:英文

页码: 1320-1327

在线出版日期: 2011-11-25（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A NEED TO REDEFINE WEB CONTENT OUTLIERS