Semantics-based text copy detection method supporting similarity ranking
Text document is the most widely used medium on the Internet.However, there are some emerging problems that cannot be neglected, such as plagiarism, reproduction of information content, illicit redistribution, and copyright disputes etc.Now plagiarists have become more and more ”clever”,they could rewrite the contents by using synonym substitution, syntactic variation and other methods.The traditional copy detection methods that use precise matching or similar string matching algorithms cannot apply to the circumstance of semantics-based copy method.To meet the challenge of supporting semantics-based copy detection, for the first time this paper proposes a semantics-based copy detection method supporting similarity ranking.Similarity scores between the suspicious text and each text from corpus are calculated using our proposed similarity calculation method.At last,top-k texts from corpus, which have high similarity scores with the suspicious text, are ranked and listed in descending order of the score.Experiments on the real-world dataset further show that our proposed solution is very efficient and effective in supporting semantics-based copy detection.
Plagiarism Semantics-based copy detection Semantics similarity Plain text keyword extraction
FU Zhangjie SUN Xingming ZHOU Lu HUANG Fengxiao
School of Computer and Software & Jiangsu Engineering Center of Network Monitoring,Nanjing University of Information Science and Technology, Nanjing, 210044
国内会议
武汉
英文
421-431
2015-03-28(万方平台首次上网日期,不代表论文的发表时间)