Fault-Tolerant Technique in the Cluster Computation of the Digital Watershed Model
This paper describes a parallel computing platform using the existing facilities for the digital watershed model. In this paper, distributed multi-layered structure is applied to the computer cluster system, and the MPI-2 is adopted as a mature parallel programming standard. An agent is introduced which makes it possible to be multi-level fault-tolerant in software development. The communication protocol based on checkpointing and rollback recovery mechanism can realize the transaction reprocessing. Compared with conventional platform, the new system is able to make better use of the computing resource. Experimental results show the speedup ratio of the platform is almost 4 times as that of the conventional one, which demonstrates the high efficiency and good performance of the new approach.
digital watershed model computer cluster MPI-2 fault-tolerant
SHANG Yizi WU Baosheng LI Tiejian FANG Shenguang
State Key Laboratory of Hydroscience and Engineering,Tsinghua University,Beijing 100084,China
国内会议
北京
英文
162-168
2007-07-15(万方平台首次上网日期,不代表论文的发表时间)