An Empirical Study on Harmonizing Classification Precision usin IE Patterns

摘要：

Web pages are conventionally represented by the words found within the contents for classification purpose. However. word-based web page representation suffers several limitations such as synonymy and homonymy. Motivated by the limitations of word-based representation, we explore the potential of representing web pages using information extraction patterns, in addition to words that are identified within the web contents. In this paper, we share the results as well as the findings learned from our experiments. Our empirical study conducted using WebKB dataset indicates that the addition of information extraction patterns in web page representation helps to improve the classification precision, especially in the categories which have much diversified web content.

关键词： web classification web mining information retrieval information extraction

作者: Lay-Ki Soon Kyu-Baek Hwang Sang Ho Lee

作者单位: Faculty of Information Technology Multimedia University, Cyberjaya Selangor, Malaysia Department of Computing Soongsil University Seoul, Korea

会议类型: 国际会议

会议名称: The 2nd International Conference on Software Engineering and Data Mining(IEEE 第二届国际软件工程和数据挖掘学术大会 SEDM 2010)

会议地点: 成都

会议语种:英文

页码: 171-176

在线出版日期: 2010-06-23（万方平台首次上网日期，不代表论文的发表时间）

会议专题

An Empirical Study on Harmonizing Classification Precision usin IE Patterns