会议专题

Design and Implement of Information Extraction System Based on XML

By studying the structure of HTML documents,this paper solves the problem of web information extraction through the standard XML technology and poses an information extraction method based on XML:construct HTMLDOM tree to implement Web cleaning and generate XHTML documents by analyzing HTML web,then analyze the XHTML files through the Xerces-Js DOM methods and construct an XPath generation algorithm;use the advantages of XSLT and XPath technology in the aspects of data location and conversion to automatically learn and generate the information extraction rules and implement the Web information extraction according to the generated XPath.

Information Extraction XML XPath XSLT Extraction Rule

Yanyan Xuan Yan Hu

Dept.Computer Science & Technology,Wuhan University of Technology,Wuhan,430070,China

国际会议

2008年国际电子商务、工程及科学领域的分布式计算和应用学术研讨会(2008 International Symposium on Distributed Computing and Applications for Business Engineering and Science)

大连

英文

1400-1404

2008-07-27(万方平台首次上网日期,不代表论文的发表时间)