Proactive Failure Management for High Availability Computing in Computer Clusters
In this paper, we propose a framework for autonomic failure management with hierarchical failure prediction functionality for coalition clusters. It analyzes node, cluster and system wide failure behaviors and forecasts the prospective failure occurrences based on quantified failure dynamics. Failure correlations are inspected by the predictor. Experimental results in a computational grid on campus show the offline and online predictions by our predictors accurately forecast the failure trend and capture failure correlations in a coalition clusters environment.
Ziming Zhang Song Fu
Department of Computer Science and Engineering New Mexico Institute of Mining and Technology, USA
国际会议
黄山
英文
377-381
2010-05-28(万方平台首次上网日期,不代表论文的发表时间)