会议专题

Compression of Inverted Index for Comprehensive Performance Evaluation in Lucene

Inverted index is the most popular index structure in search engine. Applying index compression can reduce storage space on inverted index, and improve the search performance. In this paper, we achieve comprehensive performance evaluation of three stateof-the-art index compression schemes on open source information retrieval system-Lucene. We focus on the compression and storage of document ID, frequency and position information of Lucene word-level inverted index. The main work includes: 1) the impact of ifthen-else construction of decompression process on performance in Java environment; 2) the algorithms compression ratio on the different scale of data; 3) the performance comparison of term and phrase search; 4) whether interleaving index file has remarkable discrepancies in compression ratio and decompression speed. The experiment result and analysis is given in detail.

inverted index index compression performance evaluation search engine lucene

Xianghua Xu Shengyi Pan Jian Wan

Grid and Service Computing Lab School of Computer Science and Technology Hangzhou Dianzi University, Hangzhou 310037, China

国际会议

The Third International Joint Conference on Computational Science and Optimization(第三届计算科学与优化国际大会 CSO 2010)

黄山

英文

382-386

2010-05-28(万方平台首次上网日期,不代表论文的发表时间)