Construct the XQuery-based Wrapper For Extracting Web Data
Web pages provide a large number of structured data, which are required by many advanced applications. However, existing works lack the compatibility. This paper proposes a web data extraction model which builds an XQuery-based wrapper for extracting data of web pages. We firstly annotate data values with XPATH in XML documents of sample pages. Then we design an algorithm to generate XQuery statements which can extract data form XML documents and output result data with structured or semi-structured format. Since XQuery is a well known standard for operating XML data and is supported by most database systems and applications, our wrapper has high compatibility for most applications. The experimental results demonstrated approach we proposed is feasible for extracting web data which is important for web data integration.
XQuery wrapper data extraction
Tiezheng Nie Derong Shen Ge Yu Yue Kou Dan Yang
College of Information Science and Engineering Northeastern University Shenyang, China
国际会议
上海
英文
1838-1842
2011-07-26(万方平台首次上网日期,不代表论文的发表时间)