XML Structural Similarity Search Using MapReduce

摘要：

XML is a de-facto standard for web data exchange and information representation. Efficient management of these large volumes of XML data brings challenges to conventional technique. To cope with large scale data, MapReduce computing framework as an efficient solution has attracted more and more at tention in the database community recently. In this paper, an efficient and scal able framework is proposed for XML structural similarity search on large cluster with MapReduce. First, sub-structures of XML structure are extracted from large XML corpus located on a large cluster in parallel. Then Min-Hashing and locality sensitive hashing techniques are developed on the distributed and parallel com puting framework for efficient structural similarity search processing. An empir ical study on the cluster with real large datasets demonstrates the effectiveness and efficiency of our approach.

作者: Peisen Yuan Chaofeng Sha Xiaoling Wang Bin Yang Aoying Zhou Su Yang

作者单位: School of Computer Science, Fudan University, P.R.C Shanghai Key Laboratory of Intelligent Informati Shanghai Key Laboratory of Trustworthy Computing, Software Engineering Institute, East China Normal Shanghai Key Laboratory of Intelligent Information Processing, P.R.C Shanghai Key Laboratory of Trus

会议类型: 国际会议

会议名称: 11th International Conference,WAIM 2010(第十一届网络时代管理国际会议)

会议地点: 九寨沟

会议语种:英文

页码: 169-181

在线出版日期: 2010-07-14（万方平台首次上网日期，不代表论文的发表时间）

会议专题

XML Structural Similarity Search Using MapReduce