Detection of Simple Plagiarism in Computer Science Papers

摘要：

Plagiarism is the use of the language and thoughts of another work and the representation of them as ones own original work. Various levels of plagiarism exist in many domains in general and in academic papers in particular. Therefore, diverse efforts are taken to automatically identify plagiarism. In this research, we developed software capable of simple plagiarism detection. We have built a corpus (C) containing 10,100 academic papers in computer science written in English and two test sets including papers that were randomly chosen from C. A widespread variety of baseline methods has been developed to identify identical or similar papers. Several methods are novel. The experimental results and their analysis show interesting findings. Some of the novel methods are among the best predictive methods.

作者: Yaakov HaCohen-Kerner Aharon Tayeb Natan Ben-Dror

作者单位: Department of Computer Science, Jerusalem College of Technology (Machon Lev) Department of Computer Science, Jerusalem College of Technology (Machon Lev)

会议类型: 国际会议

会议名称: The 23rd International Conference on Computational Linguistics(第23届国际计算语言学大会)

会议地点: 北京

会议语种:英文

页码: 421-429

在线出版日期: 2010-08-01（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Detection of Simple Plagiarism in Computer Science Papers