会议专题

General Chinese Webpage Content Extraction Research and Application

  The webpage text extraction is a process of extracting the structured data that meeting the requirements of researches from semi—structured webpages, which is the basis of various network data mining and search application.Based on the general introduction of the webpage information extraction technology, the paper mainly summarizes the versatile existing webpage extraction algorithms, searching and summarizing the advantages, disadvantages of various algorithms and problems badly in need of solutions, Finally adding our own thoughts on this problem, pointing out a direction to the follow-up study.

webpages text information extraction structure

GUO Dongxu WU Peng

School of information Management, Nanjing University of Science and Technology,Nanjing 210094, China

国际会议

第一届信息获取与知识服务国际会议暨第六届搜索行为与用户认知研讨会

武汉

英文

126-131

2014-10-10(万方平台首次上网日期,不代表论文的发表时间)