A Framework for Multi-features based Web Harmful Information Identification
A Framework for Multi-features based Web Harmful Information Identification Xiao-Ping Tian;Guang-Gang Geng;Hong-Tao Li; Center of Information and Network Technology Beijing Normal University Beijing 100875, P. R. China;China Internet Network Information Center Computer Network Information Center Chinese Academy of Sciences Beijing 100190, P. R. China;China Internet Network Information Center Computer Network Information Center Chinese Academy of Sciences Beijing 100190, P. R. China;;;; In recent years, the spread of harmful information such as pornography, phishing and violence, seriously disturbs the order of the Web, causes a series of adverse effects, and especially affects young peoples physical and mental health. Statistical learning based harmful information detection methods, the current research focus, have shown their superiority for easily adapting to newly developed harmful techniques. Feature selection is one of key factors that influence the development of Web harmful information detection system. This paper will describe a novel framework for recognizing harmful Web pages. In this framework multi-modal features will be extracted and each modal feather shows the different aspect of the spam information. Based on these features, we will give a feature fusion strategy. Considering the distribution of normal and harmful websites, we investigate the use of an ensemble under-sampling classification strategy to exploit the inherent imbalance of labels in this classification problem.
Xiao-Ping Tian Guang-Gang Geng Hong-Tao Li
Center of Information and Network Technology Beijing Normal University Beijing 100875, P.R.China China Internet Network Information Center Computer Network Information Center Chinese Academy of Sci
国际会议
太原
英文
614-618
2010-10-22(万方平台首次上网日期,不代表论文的发表时间)