Webpage Segmentation based on Gomory-Hu Tree Clustering in Undirected Planar Graph*
We propose a novel web page segmentation algorithm based on finding the Gomory-Hu tree in a planar graph 1. The algorithm firstly distills vision and structure information from a web page to construct a weighted undirected graph, whose vertices are the leaf nodes of the DOM tree and the edges represent the visible position relationship between vertices. Then it partitions the graph with the Gomory-Hu tree based clustering algorithm. Experimental results show that, compared with VIPS and Chakrabarti et al.s graph theoretiC algorithm, our algorithm improves upon the other two with much higher precision and recall, and its running time is far lower than that of Chakrabarti et al.s graph theoretic algorithm.
Xinyue Liu Xianchao Zhang Ye Tian Hongfei Lin
School of Electronic and Information Engineering,Dalian University of Technology,Dalian,China,116024 School of Software,Dalian University of Technology,Dalian,China,116620
国际会议
南宁
英文
192-205
2009-12-04(万方平台首次上网日期,不代表论文的发表时间)