A NEED TO REDEFINE WEB CONTENT OUTLIERS
Web is a pivotal source of information for umpteen number of application scenarios in real world.As Web data is a collaborative result of many individuals and organization its quite unstructured thus creating the possibility of content being wrongly assigned to the wrong group due to human errors or due to devious intentions of contributors.Web Content Outlier Mining focuses on identifying pages that doesnt conform to the rest and flag them as outliers.This is more of a classification task wherein pages are extracted for a category of interest and verify their commonness.Most of the existing research uses the textual content to verify the commonness among pages thus ignoring the nontextual content that is embedded within the pages. This paper explorers the need to redefine web content outliers and proposes better ways to combat web page content spam.
Web content mining Content Outliers Page quality Web Content Outlier Anomaly detection Content Spam
KRISHNA CHAITANYA PENNETE
School of Information Science,and Technology South West Jiao Tong University
国际会议
成都
英文
1320-1327
2011-11-25(万方平台首次上网日期,不代表论文的发表时间)