Measuring Code Similarity using Word Mover's Distance for Programming Course
Teachers tend to ask students submit their assignments online not only in online courses but also face to face courses.The phenomena of plagiarism is becoming more and more serious due to the ease with which resources can be found on the Internet also,especially in a computer programming course.This paper aims to develop a robust automated detection technology of code plagiarism towards programming course.After analyzing and summarized state of art of code plagiarism technology,a more robust detection technology is developed by combining word2vec with Word mover's distance(WMD)similarity metric in the paper.We consider the different plagiarism methods when students commit their program source code.Then we collect more than 20 thousands code submissions in our introductory C++programming course for non-major students and check whether it is a plagiarized code manually.In the process,we examine how our proposed method compare with two other main algorithms and their suitability for different plagiarism char-acteristics.The results obtained on the dataset indicate that our approach is well suited for detect different types of code plagiarism.We conclude that incorporating WMD similarity metric is crucial for improved effective and adaptability.
Bin Xu Fan Gao Kening Gao Changkuan Zhao Dan Yang
Northeastern University Shenyang,Liaoning Northeastern University Shenyang,Liaoning,China
国际会议
2019国图灵大会(ACM Turing Celebration conference-China 2019 )
成都
英文
147-148
2019-05-17(万方平台首次上网日期,不代表论文的发表时间)