Template-based Delta Compression of Large Scale Web Pages
Delta compression techniques are commonly used in the context of version control systems and the World Wide Web.They are used to compactly encode the differences between two files or strings in order to reduce communication or storage costs.In this paper,we study the use of delta compression in compressing massive web pages according to the similarity of their templates.We propose a framework for template-based delta compression which uses template-based clustering techniques to find the web pages that have similar templates and then encode their differences with delta compression techniques to reduce the storage cost.We also propose a filter-based optimization of Diff algorithm to improve the efficiency of the delta compression approach.To demonstrate the efficiency of our approach,we present experimental results on massive web pages.Our experiments show that template-based delta compression achieves significant improvements in compression ratio as compared to individually compressing each web page.
LCS Diff Delta compression template
Kai Lei Guangyu Sun Lianen Huang
Shenzhen Key Lab for Cloud Computing Technology & Applications(SPCCTA), Shenzhen Graduate School, Peking University, Shenzhen 518055, P.R.China
国际会议
太原
英文
608-612
2013-04-06(万方平台首次上网日期,不代表论文的发表时间)