A Data Mining Based Method: Detecting Software Defects in Source Code
With the expansion of software size and complexity,how to detect defects becomes a challenging problem. This paper proposes a defect detection method which applies data mining techniques in source code to detect two types of defects in one process. The two types of defects are rule-violating defects and copy-paste related defects which may include semantic defects.During the process, this method can also extract implicit programming rules without prior knowledge of the software and detect copy-paste segments with different granularities. The method is evaluated with the Linux kernel that contains more than 4 million lines of C code. The result shows that the resulting system can quickly detect many programming rules and violations to the rules. After using the novel pruning techniques, it will greatly reduce the effort of manually checking violations so as a large number of false positives are effectively eliminated. As an illustrative example of its effectiveness, a case study shows that among the top 50 violations reported by the proposed model,11 defects can be confirmed after examining the source code.
defect detection data mining programming rule copy-paste false positive
Yuehua Zhang Ying Liu Lingling Zhang Yong Shi
Fictitious Economy and Data Sciences Research Center Chinese Academy of Sciences,Beijing 100190, Chi Fictitious Economy and Data Sciences Research Center.Chinese Academy of Sciences.Beijing 100190, Chi
国际会议
成都
英文
569-574
2010-06-23(万方平台首次上网日期,不代表论文的发表时间)