会议专题

PLDSRC: A Multi-threaded Compressor/Decompressor for Massive DNA Sequencing Data

  To face the rapid growth of DNA sequencing data,it is of great importance to study high efficiency compression techniques to reduce the cost of storing the massive amount of sequencing data.In this paper,we propose a parallel DNA data compressor/decompressor,PLDSRC,based on the famous serial DSRC software.We first analyze the compression and decompression algorithm in DSRC and identity three basic operations,namely read,work,and write.Then a single pipeline parallel algorithm is proposed to accelerate the compression/decompression procedure.To further exploit today’s popular multi-core,multi-socket systems based on the non-uniform memory access (NUMA) architecture,we extend the single pipeline approach to the multi-pipeline case.Experiments on two different platforms are done and show that PLDSRC in both single and multiple pipeline forms is able to speed up DNA sequencing data compression/decompression greatly,while maintaining the same compressing ratio.Examples indicate that the maximum speedup of PLDSRC on compressing and decompressing is respectively around 24.71x and 22.00x,as compared to the serial DSRC software.

DNA sequencing compression DSRC PLDSRC Multi-pipeline NUMA

Ke Zhan Chao Yang Changyou Zhang Jingjing Zheng Ting Wang

Institute of Software,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy Institute of Software,Chinese Academy of Sciences,Beijing 100190,China;State Key Laboratory of Compu Institute of Software,Chinese Academy of Sciences,Beijing 100190,China

国际会议

The 13th International Symposium on Distributed Computing and Applications to Business,Engineering and Science(DCABES 2014)(第十三届分布式计算及其应用国际学术研讨会)

湖北咸宁

英文

29-33

2014-11-24(万方平台首次上网日期,不代表论文的发表时间)