会议专题

AN UNSUPERVISED LEARNING FRAMEWORK FOR DISCOVERING THE SITE-SPECIFIC ONTOLOGY FROM MULTIPLE WEB PAGES

We develop an unsupervised learning framework for tackling the problem of automatic site-specific ontology discovery from multiple pages of a Web site. To harness the uncertainty involved, our framework is designed based on a generative model which models the generation of text fragments contained in the pages of a Web site. One characteristic of our framework is that we consider clues from multiple pages collected from the Web site. Another characteristic is that we learn the regularities of the layout format to discover the site-specific ontology via stochastic grammatical inference. To accomplish the goal of ontology discovery, the ontology information blocks of a Web page are identified by making use of the site invariant information. We have conducted extensive experiments using real-world Web sites. Comparisons between existing methods and our framework have been carried out to demonstrate the effectiveness of our framework.

Ontology Web mining Tezt mining

TAK-LAM WONG KAI-ON CHOW FU LEE WANG

Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, Hong Ko Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong

国际会议

2008 International Conference on Machine Learning and Cybernetics(2008机器学习与控制论国际会议)

昆明

英文

1598-1603

2008-07-12(万方平台首次上网日期,不代表论文的发表时间)