会议专题

TITLE EXTRACTION FROM LOOSELY STRUCTURED DATA RECORDS

In this paper, we present a novel title extraction method from Loosely Structured Data Records (LSDKs). Firstly, we automatically identify the format of titles and then extract them accordingly. For the Web page whose title is occurred in all the Data Records, we obtain the one in the candidate titles which has the largest length of the same content as the accurate title. And for the Web page whose title is occurred before the first Data Record, the candidate title which has the largest length of the different content can be considered as the accurate title. Our experiment demonstrates that our automatic algorithm is robust and effective on two databases collected from the Internet.

Title eztraction Structured data records Forum data Loosely structured data records

YI-PU WU XUE-JIE ZHANG QING LI JING CHEN

Department of Computer Science and Engineering, Yunnan University, Kunming 650091, China Department Department of Computer Science and Engineering, Yunnan University, Kunming 650091, China Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong

国际会议

2008 International Conference on Machine Learning and Cybernetics(2008机器学习与控制论国际会议)

昆明

英文

2623-2628

2008-07-12(万方平台首次上网日期,不代表论文的发表时间)