会议专题

DESP: An Automatic Data Extractor on Deep Web Pages

We present DESP, an automatic data extractor on Deep Web pages for book domain, which can extract data items and label attributes at the same time. The case of DESP is to extract books information such as title, author, price and publisher from result pages returned from bookstore web sites. Although DESP is for a specific domain, the method used by DESP is highly adaptive and can suit other domains. The system consists of two parts, one is Data Record Locater, the Modified Data Locating algorithm used by it overcomes the shortcoming of the MDR algorithm, the other is Attribute Labeler, and the Detect Combine algorithm makes the data item have a more explicit meaning.

edit distance string similarity algorithm Web

Ji Ma Derong Shen TieZheng Nie

Department of Computer Science and Engineering Northeastern University, Shenyang, 110004 P.R.China

国际会议

2010 Seventh Web Information System and Applications Conference(第七届全国web信息系统及其应用学术会议)

呼和浩特

英文

132-136

2010-08-20(万方平台首次上网日期,不代表论文的发表时间)