会议专题

A NEED TO REDEFINE WEB CONTENT OUTLIERS

Web is a pivotal source of information for umpteen number of application scenarios in real world.As Web data is a collaborative result of many individuals and organization its quite unstructured thus creating the possibility of content being wrongly assigned to the wrong group due to human errors or due to devious intentions of contributors.Web Content Outlier Mining focuses on identifying pages that doesnt conform to the rest and flag them as outliers.This is more of a classification task wherein pages are extracted for a category of interest and verify their commonness.Most of the existing research uses the textual content to verify the commonness among pages thus ignoring the nontextual content that is embedded within the pages. This paper explorers the need to redefine web content outliers and proposes better ways to combat web page content spam.

Web content mining Content Outliers Page quality Web Content Outlier Anomaly detection Content Spam

KRISHNA CHAITANYA PENNETE

School of Information Science,and Technology South West Jiao Tong University

国际会议

2011 3rd International Conference on Computer Technology and Development(2011第三届计算机技术与发展国际会议 ICCTD2011)

成都

英文

1320-1327

2011-11-25(万方平台首次上网日期,不代表论文的发表时间)