会议专题

Page Query Language Generation for Structural Extraction

  The information on the Web is usually fabricated to be understandable by human users rather than machines.Its not easy to automatically catalogue and extract the Web information solely with a software agent.Based on these observations,we present an approach that uses human guided operations to automatically generate a PQL query,a SQL like query language focusing on Web pages,to extract the interested information fragments on Web pages.The PQL query uses XPath expressions to locating the target HTML nodes.We develop a K-Medoid clustering algorithm to process PQL queries to generate the structural extractions.The extracted information is structured as a relational table(in CSV format)which can be manipulated smoothly with spreadsheet software or a relational DBMS system.

PQL Structural Extraction Browser Extension

He Hu Xiaoyong Du

School of Information,Renmin University of China Key Laboratory of Data Engineering and Knowledge Engineering,MOE Beijing,China

国际会议

The 2014 10th International Conference on Natural Computation (ICNC 2014) and the 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2014)(第十届自然计算和第十一届模糊系统与知识发现国际会议)

厦门

英文

614-618

2014-08-19(万方平台首次上网日期,不代表论文的发表时间)