Join Directly on Heavy-Weight Compressed Data in Column-Oriented Database
Operating directly on compressed data can decrease CPU costs. Many light-weight compressions, such as run-length encoding and bit-vector encoding, can gain this benefit easily. Heavy-Weight Lempel-Ziv (LZ) has no method to operate directly on compressed data. We proposed a join algorithm, LZ join, which join two relations R and S directly on compressed data when de coding. Regard R as probe table and S as build table, R is encoded by LZ. When R probing S, LZ join decreases the join cost by using cached results (previous join results of IDs in Rs LZ dictionary window when decoder find that the same Rs ID sequence in window). LZ join combines decoding and join phase into one, which reduces the memory usage for decoding the whole R and CPU overhead for probing those cached results. Our analysis and experiments show that LZ join is better in some cases, the more compression ratio the better.
Heavy-weight compression join LZ join Column-oriented database compression in database LZ encoding
Gan Liang Li RunHeng Jia Yan Jin Xin
School of Computer Science, Nation University of Defense Technology, 410073 ChangSha, HuNan, China School of Software, ChangSha Social Work College, 410004 ChangSha, HuNan, China
国际会议
11th International Conference,WAIM 2010(第十一届网络时代管理国际会议)
九寨沟
英文
357-362
2010-07-14(万方平台首次上网日期,不代表论文的发表时间)