A Data Mining Based Method: Detecting Software Defects in Source Code

摘要：

With the expansion of software size and complexity,how to detect defects becomes a challenging problem. This paper proposes a defect detection method which applies data mining techniques in source code to detect two types of defects in one process. The two types of defects are rule-violating defects and copy-paste related defects which may include semantic defects.During the process, this method can also extract implicit programming rules without prior knowledge of the software and detect copy-paste segments with different granularities. The method is evaluated with the Linux kernel that contains more than 4 million lines of C code. The result shows that the resulting system can quickly detect many programming rules and violations to the rules. After using the novel pruning techniques, it will greatly reduce the effort of manually checking violations so as a large number of false positives are effectively eliminated. As an illustrative example of its effectiveness, a case study shows that among the top 50 violations reported by the proposed model,11 defects can be confirmed after examining the source code.

关键词： defect detection data mining programming rule copy-paste false positive

作者: Yuehua Zhang Ying Liu Lingling Zhang Yong Shi

作者单位: Fictitious Economy and Data Sciences Research Center Chinese Academy of Sciences,Beijing 100190, Chi Fictitious Economy and Data Sciences Research Center.Chinese Academy of Sciences.Beijing 100190, Chi

会议类型: 国际会议

会议名称: The 2nd International Conference on Software Engineering and Data Mining(IEEE 第二届国际软件工程和数据挖掘学术大会 SEDM 2010)

会议地点: 成都

会议语种:英文

页码: 569-574

在线出版日期: 2010-06-23（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A Data Mining Based Method: Detecting Software Defects in Source Code