会议专题

A HYBRID APPROACH FOR WEB INFORMATION EXTRACTION

This paper presents a new approach based on maximum entropy and maximum entropy Markov model for web information extraction. This approach is not only able to overcome the shortcoming of the less precision and recall of the hidden Markov model. In addition, this approach can make the most of various kinds of contextual information from web. The experiments are found that the hybrid approach has an average precision rate of 87.516% while the hidden Markov model trained by the Baum-Welch algorithm has an average precision rate of 68.630%. This implies that the hybrid approach is more optimized than the hidden Markov model trained by the Baum-Welch algorithm.

Information eztraction Hidden Markov model Mazimum entropy Mazimum entropy Markov model Generalized iterative scaling

JI-YI XIAO DAO-HUI ZHU LA-MEI ZOU

School of Computer Science and Technology, University of South China, Hengyang 421001, China

国际会议

2008 International Conference on Machine Learning and Cybernetics(2008机器学习与控制论国际会议)

昆明

英文

1560-1563

2008-07-12(万方平台首次上网日期,不代表论文的发表时间)