A Comparative Study of Discretization Methods for Bayesian Network in Software Estimation

Recently, Bayesian Network (BN) has become a research highlight in software effort estimation field. A lot of papers have proved the superiority of its estimation accuracy. However, the choice of discretization methods when building a BN model is seldom mentioned, while data discretization is an Important step of the whole modeling process, which may lead significantly to different estimating results even based on the same dataset and BN structure. In this paper, we develop a BN model based on previous literature. Within this model, we further compare the effect of three different discretization methods (equal width, equal frequency and k-mean) on the estimated result, combined with different number of discretization categories.These empirical studies are performed on a subset of ISBSG R8 dataset. Estimation accuracy is measured in common used metrics. Qualitative analysis shows that k-mean discretization algorithm can obviously improve the estimation accuracy, especially with discrete categories as 5. And also, the result presents the advantage of k-mean discretization in its monotonicity. The accuracy metrics is better when the number of discretization categories is closer to the optimal value.
Software effort estimation Bayesian Network Discretization algorithms
Wei Luo Qin Liu Bo Zhao
School of Software Engineering Tongji University Shanghai, China Department of Marine Control Systems Norway University of Science and Technology Trondheim, Norway
国际会议
The 10th International Conference on Intelligent Technologies(第十届智慧科技国际会议 InTech09)
桂林
英文
542-547
2009-12-12(万方平台首次上网日期,不代表论文的发表时间)